Public Leaderboard: 3rd Place : 95.001
Private Leaderboard: 3rd Place : 95.8484
There is file Final_solution.ipynb . Run each cell and at the end you will get total 6 submission files generated by following models ->
- Single LGB model
- Single Catboost model
- 5 fold LightGBM model
- 5 fold Catboost model.
- Stacking of[xgb,lgb,catboost]
- Ensemgle of[1,2,3,4,5,6] using (model_1 X 0.3+model_2 X *0.2+model_3 X *0.3 +model_4 X *0.2) X *0.8 + model_5 X *0.2
Food & Beverages Spend Prediction in Club Mahindra Resorts
Club Mahindra (Club M) makes significant revenue from Food and Beverages (F&B) sales in their resorts. The members of Club M are offered a wide variety of items and our taks was to make prediction of amount spend by a mamber per night which could help them to plan inventory accordingly.
Thanks Analytics Vidhya and Club Mahindra for organising such a wonderful hackathon,The competition was quite intense and dataset was very clean to work.
Approach :
Step-1 :
I started my problem with very basic approach changing all the features (resort id , persontravellingID , main_product_code and other ordinal features to category . Made some common for each date columns [booking date , checkin_date, checkout_date ]:
- Weekday
- Month
- Day
- Day of year
- Week of year
- Is month end
- Year
Step-2: Intuitive features
- In_out : Checkout_Date - Checkin_Date
- book_in:Checkout_date - booking_date
- Roomnights per stay : roomnights/in_out
- Roomnights per book span : roomnights / book_out
Step - 3: Time Based Features :
- Prev_resort_time = Time when the resort was previously booked.
- Prev_resort_member_time = Time when the resort was previously booked by a particular member.
- Next_resort_time = Time when the resort will Next booked.
- Next_resort_member_time = Time when the resort will next booked by a particular member.
Step-4 : Groupby Features
S.No. | TYPE | Value_column | ON |
---|---|---|---|
1. | COUNT | _ | RESORT_ID |
2 | COUNT | _ | RESORT_ID,MemberID |
3. | COUNT | _ | ['resort_id','checkout_dateyear','checkout_datemonth'] |
4. | COUNT | _ | ['memberid','checkout_dateyear'] |
5 | VAR | roomnights | RESORT_ID |
---|---|---|---|
6 | Median | roomnights | RESORT_ID,MemberID |
7. | MAX | roomnights | [resort_id,checkout_dateyear,checkout_datemonth] |
8. | MIN | roomnights | [memberid','checkout_dateyear'] |
9 | VAR | in_out | RESORT_ID |
---|---|---|---|
10 | Median | in_out | RESORT_ID,MemberID |
11. | MAX | in_out | ['resort_id','checkout_dateyear','checkout_datemonth'] |
12. | MIN | in_out | ['memberid','checkout_dateyear'] |
13 | VAR | total_pax | RESORT_ID |
---|---|---|---|
14 | Median | total_pax | RESORT_ID,MemberID |
15 | MAX | total_pax | ['resort_id','checkout_dateyear','checkout_datemonth'] |
16 | MIN | total_pax | ['memberid','checkout_dateyear'] |
…… in Similar fashion approx ~ 72 combinations were tried which gave a boost of rmse from 96 to 95.3 on LB and nearly same change in Local CV.
Modeling:
My final model consist of ensemble of
[ lightGBM , Catboost , 5_fold_Light GBM , 5_fold_Catboost and stacking of [xgb,catboost,lightGBM]