What the app does

The app shows visualizations from E-Commerce shipping data from Kaggle here. It's also interactive in creating a scatter plot based on a user-chosen feature when compared with the target variable Reached_on_time.

Feature Variables

The dataset contains 10,999 observations and 10 features. The 10 features are:

• Warehouse_block – The Company has a big Warehouse which is divided into blocks A, B, C, D, and E.

• Mode_of_Shipment – The Company ships the products in multiple ways such as Ship, Flight and Road.

• Customer_care_calls – The number of calls made from enquiry for enquiry of the shipment.

• Customer_rating – The company has rated from every customer. 1 is the lowest (Worst), 5 is the highest (Best).

• Cost_of_the_Product – Cost of the Product in US Dollars.

• Prior_purchases – The Number of Prior Purchases.

• Product_importance – The company has categorized the product in the various parameter such as low, medium, high.

• Gender – Customer's gender, either Male or Female.

• Discount_offered – Discount offered on that specific product (percentage).

• Weight_in_gms – The product weight in grams.

Target Variable

• Reached_on_time – It is the target variable, where 1 Indicates that the product has NOT reached on time and 0 indicates it has reached on time. Switched to 1 = Reached on time and 0 = Not reached on time.

Tools

This dashboard was built using Dash by Plotly. The predictor uses a Gradient Boosting Classifier model from Sci-kit Learn's library that I trained in /notebook/shipping.ipynb of this repository. This application is being served on AWS via Heroku.

Findings

Permutation Feature Importance

The top 5 most influential features in increasing model score are:

Weight_in_grams – 6.26%
Discount_offered – 2.12%
Prior_purchases – 1.08%
Cost_of_the_Product – 0.85%
Customer_care_calls – 0.35%

Bar Chart

Weight_in_gms

Number of products whose weight is below 2000 grams – 3241

Number of products whose weight is between 2000 and 4000 grams – 1792

Number of products whose weight is greater than 4000 grams – 5966

Discount_offered

Number of products that have a discount less than or equal to 10% – 8352

Number of products that have a discount greater than 10% – 2647

Prior Purchases – Among all orders, customers with 3 prior purchases is the most occuring (3955 orders). Distribution is positively skewed with a range between 2 and 10 prior purchases.

Customer_care_calls – Among all orders, 4 customer care calls is most common. Distribution is Normally distributed.

Cost_of_the_Product – Product costs range from $96 to $310. Looks like a blend of normal and uniform distribution.

Scatter Plot

Weight_in_gms – As you can see from the graph, products with weights between 2000 and 4000 grams (4.4 to 8.8 lbs) have a much higher probability of arriving late. With additional EDA, I was able to validate the accuracy of these probabilities by verifying that 99.8% of products between 2000 and 4000 grams arrive late in this dataset. I juxtaposed this with products whose weights are less than 2000 grams and products whose weights are greater than 4000 grams, and found that 67.8% and 43.2% of those products are arriving late, respectively. It's interesting to note that 67.5% of products weighing between 2000 and 4000 grams are shipped via Ship, and that 67.9% of late shipments are shipped via Ship.
Discount_offered – From the graph, one can see that products whose discount is above 10% see a 100% late shipment probability. This was verified with additional EDA where products with less than or equal to 10% discount and products with greater than 10% discount see a 46.9% and 100% late shipment rate, respectively.
Prior_purchases, Customer_care_calls, and Cost_of_the_Product do not seem to have any significant effect on late shipment probabilities.

Pie Chart

Warehouse_block – 33.3% of products come from Warehouse_block F. Every other Warehouse_block (A, B, C, D) each have 16.7% of all products in the data.

Mode_of_Shipment – Ship is the most common Mode_of_Shipment (67.8%). Flight and Road account for 16.2% and 16% of the rest of the products.

Product_importance – Products have a Product_importance of either 'low', 'medium', or 'high', which account for 48.2%, 43.2%, and 8.62% of all products, respectively.

Gender – Customers are either male (49.6%) or female (50.4%).

Reached_on_time – Of all orders in the dataset, 59.7% of orders arrived late and 40.3% arrive on time.

Improvements

In order to improve this dashboard, I would first work on improving the model. Although I delivered the model after a round of tuning using Sci-kit Learn's GridSearch, the model is currently held back by noise from a few of the existing features in the dataset. Removing these features would improve the model. For the purpose of this project, I decided to leave most of the features in training the model for the sake of interactivity.

I would also improve the UI/UX. I acknowledge this isn't the best looking dashboard but is an MVP that would be delivered to a stakeholder such as a supply chain/logistics manager. Improvements can definitely be made, and more visualizations would be great for a more comprehensive analysis of the shipping data. Thanks for taking the time to explore!

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
data		data
model		model
notebook		notebook
.DS_Store		.DS_Store
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
Procfile		Procfile
README.md		README.md
shipping.py		shipping.py
shipping_dashboard_image.png		shipping_dashboard_image.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What the app does

Feature Variables

Target Variable

Tools

Findings

Permutation Feature Importance

Bar Chart

Scatter Plot

Pie Chart

Improvements

About

Releases

Packages

Languages

License

andrewlee977/shipping-dashboard

Folders and files

Latest commit

History

Repository files navigation

What the app does

Feature Variables

Target Variable

Tools

Findings

Permutation Feature Importance

Bar Chart

Scatter Plot

Pie Chart

Improvements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages