-
Notifications
You must be signed in to change notification settings - Fork 0
Home
As most of us know that, for any Data Science Project, out of the total time spent for the project, most part will be spent on pre model building stage or pre-processing stage; which includes, Data extraction, Data Cleaning, Data Formating, New Features Extraction, Data Transformation such as Normalization, distance matrix, frequency or response rate, converting Categorical variable to number, Binning , Dummy-Fying , Clustering etc..,
In this Post we are going to focus mainly about the Data transformation using Response rate calculation and its help in model improvisation.
In survey research, Response rate, also known as Completion rate or Return rate, is the number of people who answered the survey divided by the number of people in the sample. It is usually expressed in the form of a percentage (Value between 0 to 1).
I have created a sample data for explanation
Col1 Col2 Response
a w 1
b x 1
b z 0
a w 0
a w 1
b z 0
From the above data, now we are going to calculate the response rate for each levels in individual variable against the Target(Response).
Let's take Col1: Col1, has two levels a and b. now the responses against the levels are as below
Levels 0 1 Grand Total
a 1 2 3
b 2 1 3
To calculate Response Rate, we divide the particular class response over the total responses to a particular level. i.e For the Level "a" of Col1 the total responses we got was 3 , out of which 1 responded as "0" and 2 responded as "1" the response rate for class "0" of Level "a" is 1/3 i.e 0.333333333 and response for class "1" of Level "a" is 2/3 i.e 0.666666667.
"0" Response "1" Response
over total over Total
a 0.333333333 0.666666667
b 0.666666667 0.333333333
Now! we are going to calculate the Response rate against each group/level of data, i.e first we are going to identify the groups by combining the predictors(Individual Variables) to have groups in the data set, and then calculate either the True or False Response against the total responses in the particular group.
Grouping the above data as:
Combination Response
aw 1
bx 1
bz 0
aw 0
aw 1
bz 0
Response Rate Calculation: Lets consider the Response Value 1 as True and 0 as False for understanding:
Groups 1(True) 0(False) Total Response True Response Rate False Response Rate
aw 2 1 3 0.666666667 0.333333333
bx 1 0 1 1 0
bz 0 2 2 0 1
Once arriving at the response rate, we can directly use the newly created response rate columns for model building, and it does the Magic!
This we will apply on a meaningful data set and see how it works: The Data set uploaded in the code has 237 observations and the use case is to Predict the Input as Male or Female.
Data Structure
- year (age of a person in Years)
- age (age of a person in Months)
- height (Height of a person in Inches)
- weight (Weight of a person in pounds)
Target/Class Variable
- target (M/F)
Lets see how the model building is giving results just by selecting important variables without performing any transformations.
<incomplete...>