I'm a Data Scientist and a current graduate student at UC Berkeley's Master in Data Science (MIDS) program projected to graduate in 2025.
Currently, I am working as a Data Scientist for National Capitol Contracting, helping deliver strategic insights and building automation for Naval Sea Systems Command at Pearl Harbor Naval Shipyard. Before going into data science, I was an Army Infantry Officer and later managed multiple construction projects in Hawaii as a consulting Construction Manager/Project Engineer. I am a graduate of the United States Military Academy with an undegraduate degree in Civil Engineering.
In my time at MIDS, I have had the opportunity to work on several projects. Most work is ongoing, but here are some of the completed projects that I can share with you:
SQL and NoSQL for analyzing customer sales information and making recommendations for future expansion of food delivery distribution sites
- Course: Data Engineering
- Description: Project 1: Data Wrangling to load sales data from a third-party sales channel with preliminary analytics. Used AWS and a Docker cluster running Anaconda and PostgreSQL. Project 2: Created a Neo4J graph database for the Bay Area BART system to identify future distribution locations for a food delivery service. Used Graph Path to identify the shortest path from a central supply store to distribution nodes, a centrality algorithm to determine the most influential BART station to service existing customers, and a community detection algorithm to identify BART station communities. Identified additional BART station locations for future store expansion.
- Technology: SQL, Python, NoSQL Graph Database, Linux CLI, Docker Containers, Graph Path, Centrality, Community Detection Algorithms
- Links to the repository: [https://github.com/kevinyi901/W205_DataEngineering]
Hypothesis Testing and Multiple Variable Large Sample Linear Regression
- Course: Statistics for Data Science
- Description: Project 1: A project exploring, visualizing, and conducting hypothesis testing on whether Republican voters or Democrat voters have more difficulty voting. Project 2: A project evaluating if one's occupation impacts the amount of hours worked weekly using general census data.
- Technology: R Studio, T-Test, Classical Linear Model Assumption Testing, Multi-Variate Linear Regression
- Links to the repository: [https://github.com/kevinyi901/W203_Statistics]
Exploratory Data Analysis of Colon and Lung Cancer
- Course: Introduction to Data Science Programming
- Description: A project cleaning, exploring, and visualizing 2008-2019 data from the CDC to identify racial, geographical, and gender trends in lung and colon cancer in America.
- Technology: Python, Pandas, Plotly, Matplotlib, Seaborn
- Links to the repository: [https://github.com/kevinyi901/W200]
Experimental Research Design Report
- Course: Research Design and Applications for Data Analysis
- Description: A project developing a research design report that will produce valuable and actionable insight for predicting grocery product sales using existing prediction models and social media data.
- Links to the repository: [https://github.com/kevinyi901/W201]