Skip to content

draemonsi/ECE2112-Experiment4

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Experiment 4: Data Wrangling and Data Visualization

Course: ECE 2112 - Advanced Computer Programming and Algorithms

Institution: University of Santo Tomas, Faculty of Engineering, Electronics Engineering Department

Project Overview

  • In this experiment, we use Python's Pandas library to perform data wrangling and create meaningful data visualizations from a given dataset. We analyze the dataset by applying various techniques such as filtering, conditional indexing, and visualization methods to extract insights about student performance in specific tracks. Finally, we use visualizations like box plots, bar graphs, pie charts, and scatter plots to present our findings.

Table of Contents

  1. Intended Learning Outcomes
  2. Problem Descriptions
  3. Installation Instructions
  4. Usage
  5. Files Included
  6. Technologies Used
  7. Data Interpretation
  8. Conclusion
  9. License
  10. Author

Intended Learning Outcomes

  1. Identify the functions and methods required for cleaning and visualizing data in Python using Pandas.
  2. Use conditional indexing to filter and subset DataFrames based on multiple conditions.
  3. Generate meaningful visualizations using Matplotlib and Seaborn to analyze data patterns.

Problem Descriptions

Problem 1: Create a DataFrame using a subset of the columns and specific filtering conditions

  • Objective: Filter students from Visayas whose Math score is less than 70.
    • Code sample:
board_results[['Name', 'Gender', 'Track', 'Math']][(board_results['Hometown'] == 'Visayas') & (board_results['Math'] < 70)]
  • Objective: Create a DataFrame showing students from Luzon in the Instrumentation track who scored more than 70 in Electronics.
    • Code sample:
board_results[['Name', 'GEAS', 'Electronics']][(board_results['Electronics'] > 70) & (board_results['Track'] == 'Instrumentation') & (board_results['Hometown'] == 'Luzon')]
  • Objective: Create a DataFrame showing students from Mindanao, who are female and have an average score greater than or equal to 55.
    • Code sample:
board_results['Average'] = (board_results['Math'] + board_results['Electronics'] + board_results['GEAS'] + board_results['Communication']) / 4
board_results[['Name', 'Track', 'Electronics', 'Average']][(board_results['Average'] >= 55) & (board_results['Hometown'] == 'Mindanao') & (board_results['Gender'] == 'Female')]

Problem 2: Visualizing Data

  • Objective: Create visualizations (boxplots, bar charts, pie charts, scatter plots) to analyze student performance across different tracks, subjects, and gender distributions.
    • Code sample (Boxplot):
board_results.boxplot(column=subject, by='Track', ax=axes[idx])


Installation Instructions

To run the provided Python code, ensure you have the following installed:

  1. Python (Version 3.6 or higher)
  2. Jupyter Notebook or any Python IDE (VS Code, PyCharm)
  3. Pandas, Matplotlib, Seaborn, and NumPy library

Installation steps:

  1. Clone the repository:
    git clone https://github.com/draemonsi/ECE2112-Experiment4.git
  2. Install dependencies (if Pandas is not installed):
    pip install pandas matplotlib seaborn numpy

Usage

  1. Open Jupyter Notebook or any Python environment.
  2. Load the provided dataset (board2.csv) into a Pandas DataFrame.
  3. Run the code snippets to solve each problem and create visualizations.
  4. The results will be displayed directly in your environment.

Files included

  • PA4.ipynb: Python file containing the code to perform data wrangling and visualization.
  • board2.csv: CSV file containing the dataset with students’ scores and other details.

Technologies Used

  • Python (version 3.x)
  • Pandas (Python Data Analysis Library)
  • Matplotlib and Seaborn (Data visualization)
  • Jupyter Notebook for code execution and analysis

Data Interpretation

The data from the ECE board exam reveals a multifaceted view of performance variations based on academic tracks, regional demographics, and gender distribution. In the Microelectronics track, which is predominantly female [Figure-1.0] (around 70% female students), the students demonstrate strong proficiency in Math, with many scoring above 87, and a tighter distribution of scores indicating consistent performance. However, this same group tends to underperform in the Electronics section, where their median score falls below 70, highlighting a disconnect between their theoretical math skills and practical application in electronics [Figure-1.1]. In contrast, the Instrumentation track, mainly composed of male students (over 80% male), displays robust performance across exam sections, with no scores below 55 in the General Engineering and Applied Sciences (GEAS). However, this track struggles in Communication, where no students scored above 80 [Figure-1.1].

The Communication track, with an equal gender distribution [Figure-1.0], shows a broader range of Math scores, with many students scoring below 70, and a slightly higher but still variable range of scores in Electronics, where the median hovers around 70. This indicates challenges in both technical and mathematical areas [Figure-1.1].

Regionally, students from the Visayas region (green points in the scatterplots) perform well in Math, often scoring above 80, but face significant struggles in the Electronics section, with most scoring below 60. This points to a regional disparity in performance, potentially driven by differences in educational resources or focus. Students from Mindanao and Luzon display a more balanced but variable performance across both Math and Electronics, with scores spanning a wide range in both subjects [Figure-1.2].

The scatterplots reveal that there is no strong correlation between high performance in one section and another, especially in Math and Electronics, underscoring the idea that students tend to specialize in specific areas rather than achieving uniform excellence across the board. For example, several students who scored above 80 in Math did not surpass 60 in Electronics, particularly in the Microelectronics track [Figure-1.3] [Figure-1.4].

In summary, the data underscores significant performance trends and disparities across tracks, regions, and gender groups. The math-strong but electronics-weak Microelectronics students, and the consistent but communication-challenged Instrumentation students, suggest the need for more nuanced, possibly track-specific, educational strategies. The inclusion of gender demographics—such as the predominantly male Instrumentation track and the more balanced Communication track—adds another layer to these findings, offering a pathway for more tailored interventions to address specific weaknesses and improve overall performance in the ECE board exam.

Figure 1.0

image

Figure 1.1

image

Figure 1.2

image

Figure 1.3

image

Figure 1.4

image

Conclusion

This experiment demonstrates the power of data wrangling and visualization techniques in uncovering insights from a dataset. By using Pandas, we were able to filter data based on specific conditions and manipulate the DataFrame to focus on relevant information. Additionally, the visualizations helped identify patterns in student performance across tracks, subjects, and gender distributions. The ability to interpret these patterns is crucial in making data-driven decisions and conclusions.


License

This project is licensed under The Unlicense. Please see LICENSE file for more details.


Author

Andrei Jorelle C. Simon
GitHub Profile