Serverless ML System
feel free to try »
Prediction app
·
Monitor app
This README.md is for developers who want to try to quickly develop machine learning model user interfaces online instead of just using ipynb for classification or regression. Just click the link below the picture to see our demo.
In astronomy, the classification of celestial observations is a topic of major concern. In the past, the classification of celestial objects could be judged by astronomical morphology, but with the scaling up of astronomy, the image resolution is not enough to support the classification of celestial objects based on morphology.
Classification by the photometric characteristics of celestial objects is considered a good alternative. Spectra reveal the temperature, radiation and other physics characteristics of different types of stars through the specific wavelengths of light they emit or absorb. Redshift also provides important information about the motion characteristics of different types of objects. Therefore, by analyzing photometric characteristics machine learning algorithms can efficiently classify celestial objects in large-scaling astronomy.
Astronomical observations have two basic characteristics:
Large scale: the number of catalog objects in SDSS17 released data has reached billion level;
Continuity: astronomical observation is a continuous process, and the addition of new data must be considered when building an ML model.
Due to these two characteristics, local-based celestial objects classification model could be expensive and hard to scale. In this project, we used the following frameworks: hopsworks.ai, modal.com and huggingface.com, to build up a scalable serverless machine learning system on the astronomical object classification task.
- If you have windows, install twofish
- hopsworks
- joblib
- scikit-learn==1.1.1
- seaborn
- dataframe-image
- modal
- gradio==4.2.0
Clone the repo
git clone https://github.com/bokuan/Serverless_SDSS_Astronomical_Object_Classification.git
The data sources include a data lake and a data warehosue:
The data lake is SDSS Data Release 17 (DR17), which is the final data release of the fourth phase of the Sloan Digital Sky Survey (SDSS-IV). DR17 contains SDSS observations through January 2021: https://www.sdss4.org/dr17/
The data warehouse is a pre-processed fragment from SDSS-IV DR17, which contains 100,000 celestial objects: https://www.kaggle.com/datasets/fedesoriano/stellar-classification-dataset-sdss17