An end-to-end data science project using historical weather data from Singapore

By Chua Chin Hon

Github Repo Title: weather_singapore_cch

SUMMARY

For data science students in Singapore, it is hard to find detailed, yet publicly available local datasets for lessons or personal projects. I came across a multi-decade collection of weather data on the Singapore Met Service's website by chance, and decided to assemble it for future use, or in case the data is taken offline.

I'm also using the dataset for a series of self-assigned data science projects, starting with visualisation. I will include time series and machine learning forecasts in future updates to this project.

TABLE OF CONTENT

There are 5 sections so far. The CSV files containing the daily and monthly weather data are in the raw folder. Those who want to assemble their own datasets should head there first.

I. DATA COLLECTION-PREPROCESSING

What you'll find in the raw folder:

444 CSV files containing daily weather data for Singapore from 1983 - 2019 (Dec)
A "monthly_data" sub-folder containing monthly average data for rainfall, maximum and mean temperatures.

What you'll find in the data folder:

4 CSV files processed in the notebook 1.0_data_cleaning_cch
2 CSV files related to outlier detection, as processed in the notebook 3.0_outlier_detection_cch.ipynb
1 CSV file related to the Q3 2019 scorcher in Singapore
1 CSV file related to the notebooks for machine learning and deep learning, as processed in notebook5.0 and 1 validation dataset.

II. EDA & DATA VISUALISATION

The lack of seasonal variations lull many into thinking that Singapore's weather is predictable and unchanging. Nothing is further from the truth, with climate change making the city state's weather even more unpredictable.

In notebook 2.0_visualisation_cch, I'll attempt to illustrate the changing weather patterns in Singapore using classic as well as new visualisation libraries/techniques like Plotly Express.

Medium post: Visualising Singapore’s Changing Weather Patterns: 1983–2019

III. OUTLIER DETECTION

Data visualisation provide an easy way to spot outliers. But when you have 36 years of weather data, it won't be enough or efficient to rely solely on charts to accurately pick out the outliers.

In the third section of this project, I'll use Scikit-learn's Isolation Forest model as well as the PyOD library (Python Outlier Detection) to try to pinpoint anomalies in the dataset. This is also important pre-work for Part IV of the project - time series forecasting, where removal of the outliers would be key to more accurate predictions.

Medium post: Detecting Abnormal Weather Patterns With Data Science Tools

IV. Scorcher: Q3 2019 temperature records

This fourth notebook is a short follow-up of sorts to Part II, looking at how temperatures during the three months between July and September 2019 were among the warmest Singapore had experienced over the last 36 years, as global temperature records tumbled around the world.

Medium post: SCORCHER: As Global Records Tumbled, S’pore Baked Under One Of The Warmest Q3 Ever

V. Weather Predictions: ‘Classic’ Machine Learning Models Vs Keras

You are ready to dip your toes into deep learning but not sure where to start. One way is to build on what you've been doing in Scikit-learn, and apply useful features like pipelines and grid search via the Keras wrappers.

This fifth series of notebooks starts with a simple example on pipeline construction and grid search for a binary classification problem, using the Logistic Regression and XGBoost Classifier.

In notebook 5.2, I tackled the same problem using the Keras Classifier, which introduces the concept of defining and building a Keras sequential model.

In notebook 5.3, I experimented with the relatively new Keras Tuner as an alternative to the Scikit-learn/grid search approach.

Data preparation for this section of the project are in notebook 5.1. The validation dataset is here.

Medium Post: https://bit.ly/2QJdrpD

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
data		data
interactive_charts		interactive_charts
notebooks		notebooks
raw		raw
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

An end-to-end data science project using historical weather data from Singapore

By Chua Chin Hon

Github Repo Title: weather_singapore_cch

SUMMARY

TABLE OF CONTENT

I. DATA COLLECTION-PREPROCESSING

II. EDA & DATA VISUALISATION

III. OUTLIER DETECTION

IV. Scorcher: Q3 2019 temperature records

V. Weather Predictions: ‘Classic’ Machine Learning Models Vs Keras

CONTACT

Twitter: @chinhon

LinkedIn: www.linkedin.com/in/chuachinhon

About

Releases

Packages

Languages

chuachinhon/weather_singapore_cch

Folders and files

Latest commit

History

Repository files navigation

An end-to-end data science project using historical weather data from Singapore

By Chua Chin Hon

Github Repo Title: weather_singapore_cch

SUMMARY

TABLE OF CONTENT

I. DATA COLLECTION-PREPROCESSING

II. EDA & DATA VISUALISATION

III. OUTLIER DETECTION

IV. Scorcher: Q3 2019 temperature records

V. Weather Predictions: ‘Classic’ Machine Learning Models Vs Keras

CONTACT

Twitter: @chinhon

LinkedIn: www.linkedin.com/in/chuachinhon

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages