This repository contains the work done for Coursera's Getting and Cleaning Data project.
The run_analysis.R
script contains the R code for creating a tidy data set from the raw data downloaded from https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip. The data contains measurements values from subjects performing various activities namely walking, walking upstairs, walking downstairs, sitting, standing and laying, while wearning the computing devices.
The raw data must be available for the script to create the tidy data set.
- The train sets are read and the data contained in the subject_train.txt, X_train.txt & y_train.txt are read and merged into a single data frame. The colums are labeled appropriately.
- The test sets are read and the data contained in the subject_test.txt, X_test.txt & y_test.txt are read and merged into a single data frame. The colums are labeled appropriately.
- The train and the test data sets are merged together to create a single data set named
tidy
. - Next the activity lables are applied to the dataset named
labled_tidy
- A seperate data set containing the standard deviation and mean values is created and named as
std_mean_tidy
- Another dataset is created with the average values of the variables on a per-subject basis and named
mean_df
- For all the above data sets, the top three rows are displayed after the data set is created.
- Finally, the data sets are dumped into csv files named tidy_data.csv, std_mean.csv and mean_subject.csv
- Download the data set from https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip
- Extract the downloaded data set in the directory containing the run_analysis.R script.
- After extraction, you should have a directory named UCI HAR Dataset in the same directory as the script.
- Execute the script either from the R console or R-studio.