A flashcards-like collection of interviews questions for Data Science
Give me an example where MLE is equivalent to MAP
ANS
See details here
How to decide between L1 and L2 Loss Function?
ANS
See details here
How to deal with Skewed Data?
ANS
See details here
What is Bayesian Linear Regression?
ANS
The aim of Bayesian Linear Regression is not to find the single “best” value of the model parameters, but rather to determine the posterior distribution for the model parameters.
See details here
How to interpret the Regression Coefficients for Curvilinear Relationships and Interactive Terms?
ANS
See details here
What are the assumptions required for linear regression?
ANS
-
There is a linear relationship between the dependent variables and the regressors, meaning the model you are creating actually fits the data
-
The errors or residuals of the data are normally distributed and independent from each other
-
There is minimal multicollinearity between explanatory variables
-
Homoscedasticity. This means the variance around the regression line is the same for all values of the predictor variable.
See details here
What are limitations in Linear Regression models?
ANS
-
Linear regression models are sensitive to outliers
-
Overfitting - It is easy to overfit your model such that your regression begins to model the random error (noise) in the data, rather than just the relationship between the variables. This most commonly arises when you have too many parameters compared to the number of samples
-
Linear regressions are meant to describe linear relationships between variables. So, if there is a nonlinear relationship, then you will have a bad model. However, you can sometimes compensate for this by transforming some of the parameters with a log, square root, etc. transformation.
-
The data may not fit the model due to violation of assumptions. The other answers deal with this and there is lots of material in textbooks and online about this, so, I won’t say more about it.
See details here
Is Logistic Regression a linear model? Why?
ANS
See details here
How to interpret the weights in Logistic Regression?
ANS
-
For intercept
$\beta_0$ , it just denotes that when all numerical features and categorical features are zero, the estimated odds (probability of event divided by probability of no event) are$\exp(\beta_0)$ -
For numerical features, If you increase the value of feature
$x_j$ by one unit, the estimated odds change by a factor of$\exp(\beta_j)$ -
For binary categorical features: one of the two values of the feature is the reference category. Changing the feature
$x_j$ from the reference category to the other category changes the estimated odds by$\exp(\beta_j)$ -
Categorical feature with more than two categories: One solution to deal with multiple categories is one-hot-encoding, meaning that each category has its own column. You only need L-1 columns for a categorical feature with L categories, otherwise it is over-parameterized. The L-th category is then the reference category. You can use any other encoding that can be used in linear regression. The interpretation for each category then is equivalent to the interpretation of binary features.
See details here
What is navie about Navie Bayes?
ANS
See details here
How does random forest calcuate Feature Importance?
ANS
See details here
How is k-NN different from k-means clustering?
ANS
See details here
What is Simpson's Paradox?
ANS
See details here
What is the Central Limit Theorem and why is it important?
ANS
Formally, it states that if we sample from a population using a sufficiently large sample size, the mean of the samples (also known as the sample population) will be normally distributed (assuming true random sampling). What’s especially important is that this will be true regardless of the distribution of the original population.
See details here
What is sampling? How many sampling methods do you know?
ANS
-
Sampling based on Probability:
-
Simple random sampling: Software is used to randomly select subjects from the whole population
-
Stratified sampling: Subsets of the data sets or population are created based on a common factor, and samples are randomly collected from each subgroup
-
Cluster sampling: The larger data set is divided into subsets (clusters) based on a defined factor, then a random sampling of clusters is analyzed
-
Multistage sampling: A more complicated form of cluster sampling, this method also involves dividing the larger population into a number of clusters. Second-stage clusters are then broken out based on a secondary factor, and those clusters are then sampled and analyzed. This staging could continue as multiple subsets are identified, clustered and analyzed
-
Systematic sampling: A sample is created by setting an interval at which to extract data from the larger population -- for example, selecting every 10th row in a spreadsheet of 200 items to create a sample size of 20 rows to analyze
-
-
Sampling based Non-Probability:
-
Convenience sampling: Data is collected from an easily accessible and available group
-
Consecutive sampling: Data is collected from every subject that meets the criteria until the predetermined sample size is met
-
Purposive or judgmental sampling: The researcher selects the data to sample based on predefined criteria
-
Quota sampling: The researcher ensures equal representation within the sample for all subgroups in the data set or population
-
See details here
What is the difference between type I vs type II error?
ANS
A type II (FN) error occurs when the null hypothesis is false, but erroneously fails to be rejected. Let me say this again, a type II error occurs when the null hypothesis is actually false, but was accepted as true by the testing.
See details here
How to interpret P-values in Linear Regression Analysis
ANS
See details here
What is a statistical interaction?
ANS
See details here
What is selection bias?
ANS
See details here
What is an example of a data set with a non-Gaussian distribution?
ANS
In a Poisson or Bernoulli process, the statistic that gives the time to the next event is not normal, but the data collected in such processes is the number of events per time unit, and for large 𝑛, that's approximately normal.
See details here
What is F-test?
ANS
The test statistic of the F-test is a random variable whose Probability Density Function is the F-distribution under the assumption that the null hypothesis is true.
The testing procedure for the F-test for regression is identical in its structure to that of other parametric tests of significance such as the t-test.
See details here
What is the Standard Deviation?
ANS
See details here
What is the Standard Error?
ANS
See details here
Describe the procedure of Hypothesis Testing.
ANS
- Determine the null hypothesis and alternative hypothesis
- Verifiy data condition
- Assume that the null hypothesis is true, calculate the p-value
- Decide whether or not the result is statistically significant
- Report the conclusion
See details here
How to compute Confidence Interval for population Mean?
ANS
See details here
How to compute Confidence Interval for population Median?
ANS
Include questions about Data Structures/Algorithms/Coding Concepts. Since I assume for those questions, coding practice should be focused more.
What are some pros and cons about your favorite statistical software?
ANS
How would you sort a large list of numbers?
ANS
See details here
What Native Data Structures Can You Name in Python?
ANS
See details here
In Python, How is Memory Managed?
ANS
See details here
What is worst case time complexity of quick sort?
ANS
What is average case time complexity of quick sort?
ANS
What is worst case time complexity of looking up a value in a hashtable?
ANS
What is average case time complexity of looking up a value in a hashtable?
ANS
What do you deal with collision in hashtable?
ANS
See details here
What is the difference between black-red tree with binary search tree?
ANS
See details here
What is the largest and minimum possible height of a binary tree with n elements?
ANS
What is the largest possible height of a balanced binary search tree with n elements?
ANS
Tell me the difference between an inner join, left join/right join, and union.
ANS
See details here
What does UNION do? What is the difference between UNION and UNION ALL?
ANS
See details here
What is model checkpointing?
ANS
See details here
What are the problems with sigmoid as activation function?
ANS
See details here
What regularization techniques for neural nets do you know
ANS
See details here
What need to be taken cautions when updating pretrained weights in language models? And solutions?
ANS
See details here