Skip to content

Latest commit

 

History

History
53 lines (44 loc) · 4.58 KB

CapstoneProjectNotes.md

File metadata and controls

53 lines (44 loc) · 4.58 KB

Capstone Project Description and Requirements, Aims & Methods, Tools used and Conclusions

(link to live version of the project)

Tools used in this Project:

PostgreSQL MySQL image image Vim Python Jupyter Notebook Pandas scikit-learn Matplotlib image

alt text

Project Aims:
(click to expand)
    I chose to examine what if any relationsships existed in the following:
    1) Examine any overall trends in countries and medals won for both Summer and Winter Olympic Games.
    2) Determine if any trends emerge for teams winning seasonal events by countries with favourable geography and climate for that event.
    3) Determine if any trends emerged over time for medals won, participating countries.
Data Set and Data Cleaning Process
(click to expand)
    The dataset is publicly available and consists of 2 separate .csv files for Olympic Events participants and Medals won from 1900 to 2016.
    Initial data set examination was performed with Pandas and Excel to look for general features of interest & potential problems with the data.
    Significant data cleaning and formatting was required to prepare the data for further evaluation, examples including but not limited to:
  • creating consistency in names, accounting for historical changes (e.g. East Germany, USSR etc), separating data points for better evaluation.
  • Data Exploration
    (click to expand)
      The main tools used in exploring the data set were SQL,and Pandas with some "on-the-fly" visualizations created using Matplotlib, Pandas, Seaborn and Excel.
      I created the ERD for the data sets using MySQl, but performed the queries using PostgreSQL in PgAdmin.
      Some exploratory analyses were inconclusive and thus excluded in the final results (e.g. regression analysis using Scikit-Learn & Seaborn).
      These limitations were due primarily to the dataset itself, and I omitted inconsequential or trivial analyses results (e.g. athletes ages).
      Individually former countries W. and E. Germany won a large number of events, but this was reflected overall for Germany as a leading medal winner.
      To examine the overall medals won by Germany, I also combined modern and former East and West to evaluate the number of medals won by them.
    Results and Conclusions
    *(click to expand)
      Overall a small number of the same countries(teams) consistently won the majority of medals.
      The countries that consistently won the most awards were the USA, Great Britain and the former USSR, and Germany.
      Notable was that by combining medals won by former East and West Germany, clarified the data that Germany was one of the leaders for medals won.
      As suspected, countries that naturally support some events (e.g. Winter Sports) ranked higher in relevant events.
      One outlier for the countries with the most medals won, was Canada (Ice Hockey). This however, also seemed to support the hypothesis for geographic / climate tendencies in seasonal event performance, as Canada was a consistent leader in this event.