Skip to content

krajiv/cleaning_data_proj

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Getting and Cleaning Data - Project

This repository contains the work done for Coursera's Getting and Cleaning Data project.

The run_analysis.R script contains the R code for creating a tidy data set from the raw data downloaded from https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip. The data contains measurements values from subjects performing various activities namely walking, walking upstairs, walking downstairs, sitting, standing and laying, while wearning the computing devices.

Pre-requisite

The raw data must be available for the script to create the tidy data set.

Script Details

  • The train sets are read and the data contained in the subject_train.txt, X_train.txt & y_train.txt are read and merged into a single data frame. The colums are labeled appropriately.
  • The test sets are read and the data contained in the subject_test.txt, X_test.txt & y_test.txt are read and merged into a single data frame. The colums are labeled appropriately.
  • The train and the test data sets are merged together to create a single data set named tidy.
  • Next the activity lables are applied to the dataset named labled_tidy
  • A seperate data set containing the standard deviation and mean values is created and named as std_mean_tidy
  • Another dataset is created with the average values of the variables on a per-subject basis and named mean_df
  • For all the above data sets, the top three rows are displayed after the data set is created.
  • Finally, the data sets are dumped into csv files named tidy_data.csv, std_mean.csv and mean_subject.csv

Steps to execute

  1. Download the data set from https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip
  2. Extract the downloaded data set in the directory containing the run_analysis.R script.
  3. After extraction, you should have a directory named UCI HAR Dataset in the same directory as the script.
  4. Execute the script either from the R console or R-studio.

About

This repository contains the project work for data cleanning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages