Skip to content

Latest commit

 

History

History
95 lines (66 loc) · 5.98 KB

PROPOSAL.md

File metadata and controls

95 lines (66 loc) · 5.98 KB

Project Proposal

Finding Insights from Stack Overflow Developer Survey

Stack Overflow is a professional community for developers, conducting an annual survey. The collected data from 2011 onwards has been available for open source on the web, with the latest dataset released in 2020. Analyzing this dataset professionally using modern tools would enable us to answer real-world questions effectively. The dataset includes responses to 275 questions.

Project Goal:

  1. Perform Analysis on 3 years of Stack Overflow Dataset: Extract insights from the data.
  2. Data Analysis Goals: Answer the following questions:
    • What is the impact of higher education on the salary of surveyed developers?
    • How do education, experience, and responsibilities affect gender inequalities?
    • How does ethnicity impact participation rates?
    • Is there a difference in income between men and women?
    • How does the previous year's interest in a language affect its popularity in the current year?
  3. Data Visualization Goals:
    • Identify the most commonly used language.
    • Analyze the distribution of surveyors based on their developer roles.
    • Explore factors affecting job satisfaction.
    • Predict the growth of languages for upcoming years based on survey answers.
    • Provide insights for IT environment, hiring employees, job seekers, and building a solid résumé.

Data Source and Background

The dataset is sourced from the annual Stack Overflow developer survey, covering responses from developers in 180 countries. The data range from 2011 to 2020, with the focus being on the last 3 years. Respondents primarily come from the US, India, and EMEA regions, with a background in developer/coding experience. The dataset includes survey data gathered from 180 countries, with responses ranging from "Not at all important" to "Very important" and "Not at all satisfied" to "Very satisfied."

Data Format

The data is in CSV format, consisting of 252,199 observations and 62 variables.

Projected Work for Insights

Data Wrangling

  • Dealing with Null Values: Handle unanswered questions marked as ‘NA’ or ‘Not Applicable’ to ensure precise analysis.
  • Data Conversion/Manipulation: Convert data for analysis, considering that respondents answered the survey through radio buttons rather than yes or no patterns (Univariate analysis).

Techniques Expected to Use in the Project

  • ML Algorithms: Utilize algorithms like Random Forest, KNN, AUC for classification problems, logistic regression, and linear regression.
  • Data Visualization: Employ data visualization techniques for better understanding and presentation of insights.
  • Parameter Analysis: Analyze parameters to fine-tune models and improve accuracy.

Project Plan

Week 8: Project Base Setup

  • Source control setup on GitHub
  • Project Management using tools like MS Project
  • Complete Data Wrangling & Basic Analysis

Week 10: Baseline Model Building

  • Implement algorithms and build baseline models

Week 11: Model Evaluation

  • Run tests and evaluate the performance of models

Week 12: Finalization

  • Prepare video presentation summarizing the analysis and insights

Additional Technical Details

Linear regression(RFE techniques)

$$ y = O_1X + O_2 $$

Root Mean Squared Error Calculations

$$ rmse = \sqrt{\left(\frac{1}{n}\right)\sum_{i=1}^{n}(y_{i} - x_{i})^{2}} $$

Potential Impact and Benefits

  • Inform Decision-Making: The insights derived from the analysis of the Stack Overflow Developer Survey data can inform decision-making processes in various domains, including education, recruitment, and workforce development.
  • Address Gender and Ethnic Inequalities: By analyzing the impact of education, experience, and responsibilities on gender and ethnic inequalities, this project can contribute to raising awareness and identifying strategies to address these disparities in the tech industry.
  • Support Career Development: The findings from the analysis can provide valuable insights for developers seeking to advance their careers, make informed decisions about their education and training, and enhance their job satisfaction.
  • Contribute to Open Source Community: By contributing to this project, developers have the opportunity to collaborate with others, share their expertise, and contribute to open-source initiatives aimed at improving data analysis techniques and tools.
  • Empower Data-Driven Decision Making: The project's focus on ML algorithms, data visualization, and predictive analytics empowers stakeholders to make data-driven decisions, enabling them to stay ahead in a rapidly evolving technology landscape.

👨‍💻 Contributing

  • Contributions Welcome: We welcome contributions from the community to help enhance this project. Whether you're a seasoned developer or just starting out, there are various ways you can contribute:
    • Code Contributions: Help improve the analysis code, implement new features, or optimize existing algorithms.
    • Data Wrangling: Assist in cleaning and preprocessing the dataset to ensure accurate analysis.
    • Documentation: Enhance project documentation, including the README file, to make it more comprehensive and user-friendly.
    • Bug Fixes: Identify and fix any bugs or issues encountered during the analysis.
    • Feature Requests: Suggest new features or improvements to further enhance the project.
  • How to Contribute: To contribute, simply fork the repository, make your changes, and submit a pull request. Be sure to follow the contribution guidelines outlined in the repository.
  • Contributors Recognition: We greatly appreciate all contributions to this project and will acknowledge contributors in the README file.
  • Join the Discussion: Feel free to join the discussion on our GitHub repository to share your ideas, ask questions, or collaborate with other contributors.

Crafted by @Sanjayviswa.