Experiment 4: Data Wrangling and Data Visualization

Course: ECE 2112 - Advanced Computer Programming and Algorithms

Institution: University of Santo Tomas, Faculty of Engineering, Electronics Engineering Department

Project Overview

In this experiment, we use Python's Pandas library to perform data wrangling and create meaningful data visualizations from a given dataset. We analyze the dataset by applying various techniques such as filtering, conditional indexing, and visualization methods to extract insights about student performance in specific tracks. Finally, we use visualizations like box plots, bar graphs, pie charts, and scatter plots to present our findings.

Intended Learning Outcomes

Identify the functions and methods required for cleaning and visualizing data in Python using Pandas.
Use conditional indexing to filter and subset DataFrames based on multiple conditions.
Generate meaningful visualizations using Matplotlib and Seaborn to analyze data patterns.

Problem Descriptions

Problem 1: Create a DataFrame using a subset of the columns and specific filtering conditions

Objective: Filter students from Visayas whose Math score is less than 70.
- Code sample:

board_results[['Name', 'Gender', 'Track', 'Math']][(board_results['Hometown'] == 'Visayas') & (board_results['Math'] < 70)]

Objective: Create a DataFrame showing students from Luzon in the Instrumentation track who scored more than 70 in Electronics.
- Code sample:

board_results[['Name', 'GEAS', 'Electronics']][(board_results['Electronics'] > 70) & (board_results['Track'] == 'Instrumentation') & (board_results['Hometown'] == 'Luzon')]

Objective: Create a DataFrame showing students from Mindanao, who are female and have an average score greater than or equal to 55.
- Code sample:

board_results['Average'] = (board_results['Math'] + board_results['Electronics'] + board_results['GEAS'] + board_results['Communication']) / 4
board_results[['Name', 'Track', 'Electronics', 'Average']][(board_results['Average'] >= 55) & (board_results['Hometown'] == 'Mindanao') & (board_results['Gender'] == 'Female')]

Problem 2: Visualizing Data

Objective: Create visualizations (boxplots, bar charts, pie charts, scatter plots) to analyze student performance across different tracks, subjects, and gender distributions.
- Code sample (Boxplot):

board_results.boxplot(column=subject, by='Track', ax=axes[idx])

Installation Instructions

To run the provided Python code, ensure you have the following installed:

Python (Version 3.6 or higher)
Jupyter Notebook or any Python IDE (VS Code, PyCharm)
Pandas, Matplotlib, Seaborn, and NumPy library

Installation steps:

Clone the repository:
git clone https://github.com/draemonsi/ECE2112-Experiment4.git
Install dependencies (if Pandas is not installed):
pip install pandas matplotlib seaborn numpy

Usage

Open Jupyter Notebook or any Python environment.
Load the provided dataset (board2.csv) into a Pandas DataFrame.
Run the code snippets to solve each problem and create visualizations.
The results will be displayed directly in your environment.

Files included

PA4.ipynb: Python file containing the code to perform data wrangling and visualization.
board2.csv: CSV file containing the dataset with students’ scores and other details.

Technologies Used

Python (version 3.x)
Pandas (Python Data Analysis Library)
Matplotlib and Seaborn (Data visualization)
Jupyter Notebook for code execution and analysis

Data Interpretation

The data from the ECE board exam reveals a multifaceted view of performance variations based on academic tracks, regional demographics, and gender distribution. In the Microelectronics track, which is predominantly female [Figure-1.0] (around 70% female students), the students demonstrate strong proficiency in Math, with many scoring above 87, and a tighter distribution of scores indicating consistent performance. However, this same group tends to underperform in the Electronics section, where their median score falls below 70, highlighting a disconnect between their theoretical math skills and practical application in electronics [Figure-1.1]. In contrast, the Instrumentation track, mainly composed of male students (over 80% male), displays robust performance across exam sections, with no scores below 55 in the General Engineering and Applied Sciences (GEAS). However, this track struggles in Communication, where no students scored above 80 [Figure-1.1].

The Communication track, with an equal gender distribution [Figure-1.0], shows a broader range of Math scores, with many students scoring below 70, and a slightly higher but still variable range of scores in Electronics, where the median hovers around 70. This indicates challenges in both technical and mathematical areas [Figure-1.1].

Regionally, students from the Visayas region (green points in the scatterplots) perform well in Math, often scoring above 80, but face significant struggles in the Electronics section, with most scoring below 60. This points to a regional disparity in performance, potentially driven by differences in educational resources or focus. Students from Mindanao and Luzon display a more balanced but variable performance across both Math and Electronics, with scores spanning a wide range in both subjects [Figure-1.2].

The scatterplots reveal that there is no strong correlation between high performance in one section and another, especially in Math and Electronics, underscoring the idea that students tend to specialize in specific areas rather than achieving uniform excellence across the board. For example, several students who scored above 80 in Math did not surpass 60 in Electronics, particularly in the Microelectronics track [Figure-1.3] [Figure-1.4].

In summary, the data underscores significant performance trends and disparities across tracks, regions, and gender groups. The math-strong but electronics-weak Microelectronics students, and the consistent but communication-challenged Instrumentation students, suggest the need for more nuanced, possibly track-specific, educational strategies. The inclusion of gender demographics—such as the predominantly male Instrumentation track and the more balanced Communication track—adds another layer to these findings, offering a pathway for more tailored interventions to address specific weaknesses and improve overall performance in the ECE board exam.

Figure 1.0

Figure 1.1

Figure 1.2

Figure 1.3

Figure 1.4

Conclusion

This experiment demonstrates the power of data wrangling and visualization techniques in uncovering insights from a dataset. By using Pandas, we were able to filter data based on specific conditions and manipulate the DataFrame to focus on relevant information. Additionally, the visualizations helped identify patterns in student performance across tracks, subjects, and gender distributions. The ability to interpret these patterns is crucial in making data-driven decisions and conclusions.

License

This project is licensed under The Unlicense. Please see LICENSE file for more details.

Author

Andrei Jorelle C. Simon
GitHub Profile

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
LICENSE.txt		LICENSE.txt
PA4.ipynb		PA4.ipynb
README.md		README.md
board2.csv		board2.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Experiment 4: Data Wrangling and Data Visualization

Course: ECE 2112 - Advanced Computer Programming and Algorithms

Project Overview

Table of Contents

Intended Learning Outcomes

Problem Descriptions

Problem 1: Create a DataFrame using a subset of the columns and specific filtering conditions

Problem 2: Visualizing Data

Installation Instructions

Installation steps:

Usage

Files included

Technologies Used

Data Interpretation

Figure 1.0

Figure 1.1

Figure 1.2

Figure 1.3

Figure 1.4

Conclusion

License

Author

About

Languages

License

draemonsi/ECE2112-Experiment4

Folders and files

Latest commit

History

Repository files navigation

Experiment 4: Data Wrangling and Data Visualization

Course: ECE 2112 - Advanced Computer Programming and Algorithms

Project Overview

Table of Contents

Intended Learning Outcomes

Problem Descriptions

Problem 1: Create a DataFrame using a subset of the columns and specific filtering conditions

Problem 2: Visualizing Data

Installation Instructions

Installation steps:

Usage

Files included

Technologies Used

Data Interpretation

Figure 1.0

Figure 1.1

Figure 1.2

Figure 1.3

Figure 1.4

Conclusion

License

Author

About

Topics

Resources

License

Stars

Watchers

Forks

Languages