This repository has all the files needed to complete the Xpand IT DS Challenge.
In this challenge, you will find a realistic and imperfect dataset. Your goal will be to solve the following presented problem and write a full report about your progress and findings just like you would for a client.
Your report should cover all steps of the process (business understanding, exploratory data analysis, modelling, among others). If you want, you can read more about the DS project flow at our website: https://www.xpand-it.com/data-science-process/.
Next, you will find a brief description of the dataset and a set of questions/guidelines to guide you during the challenge.
Your goal will be to study the ‘Dow Jones Index Data Set’, which is a financial dataset with information from the 30 largest companies in the stock exchange in the USA. The dataset contains weekly historical data related to those companies from the first and second quarters of 2011. You can find the dataset at the Git repository.
One of the tasks of a data scientist is to solve a problem using the data provided by a client.
In this case, the client has 100€ to invest in a single company each week, your task is to build a system that predicts what will be the best company to invest each week. The stock will be bought at the start of the week and will be sold at the end of it.
By the end of the challenge, you should submit your results for evaluation by a member of our data science team by replying to the email you initially received with the challenge and attach the notebook with your submission.
You should always work with the newest version of the dataset at our Git (https://github.com/dsu-xpand-it/DSU-Recruitment-Challenges) that you should clone or download at the beginning of your project. To help you structure your submission you can find in the repository a notebook template where you should present your code, results, and conclusions.
You will be evaluated according to your approach to the problem, how you answer the questions we provide and how you go beyond just those questions to provide a full EDA and solution. The structure of your code, as well as your ability to describe the work done, will too make part of your evaluation.
To guide you during the challenge you can follow the next set of guidelines. Note that those guidelines have the purpose to help you and guide you in the right direction. However, feel free to follow your approach and to take your conclusions.
Business Analysis
Here you should conduct a brief analysis of what is the Dow Jones Index. You can enumerate the main topics to take into account based upon the dataset provided as well as your understandings of the variables in the dataset.
Data Understanding
During your data understanding phase, you should focus on understanding what each variable represents, compute statistics and visualizations. Some questions that may guide your work follow:
- Feature engineering: should new features be created from the existing ones?
- What will be your features and your label?
- Is the dataset ready for the prediction task? (ex: missing values)
- How will the data be split into train and test sets?
Modelling
In this phase, your main goal is to develop and describe your approach to the solution of the problem. Some guidelines to help you:
- What metrics will you use to evaluate your solutions?
- What are some algorithms that can lead to good results? And why?
- Describe in detail your thought process during the development of your solution.
- Present your results.
Conclusions
In the conclusions, you should enumerate the conclusions you arrived after completing the challenge.
- How good do you consider your results?
- What are some factors that would contribute to get better results?
- What are some advantages and disadvantages of your solution?
- What can be done as future work to improve your results?
If you have any questions don’t hesitate to contact us.
We wish you the best luck during the challenge and hope to see you soon in the office,
Xpand IT Data Science Team