Data Science Resources

Statistics
Machine Learning
SQL
Product Case

Statistics

Z-test for Means

The z-test is one of the most basic, and commonly used hypothesis tests.

Download PDF Here

Z-test for Proportions

The z-test is a great asset to use when exploring proportions.

Download PDF Here

One Sample t-test

The one sample t-test is one of the top topics asked in statistics interviews.

Download PDF Here

Two Sample t-test

The two sample t-test is helpful whenĀ determiningĀ if two population means are equal.

Download PDF Here

Machine Learning

Random Forest

Random Forest is one of the most useful pragmatic algorithms for fast, simple, flexible predictive modeling.

Download PDF Here

L1 and L2 Regularization

Regularization introduces a regularization term to the loss function of a model in order to improve the generalization of a model.

Download PDF Here

How to Handle Imbalanced Dataset

Imbalanced data is one of the most common machine learning problems you’ll come across in data science interviews.

Download PDF Here

K-means

K-Means is one of the most popular machine learning algorithms you’ll encounter in data science interviews.

Download PDF Here

How to Handle Categorical Data

Handling categorical data in machine learning projects is a very common topic in data science interviews.

Download PDF Here

Ensemble Methods: Boosting, Bagging, and Stacking

Examples of ensemble learning, the advantages of boosting and bagging, how to explain stacking, and more.

Download PDF Here

Feature Selection

How to use feature selection with over 10,000 features, how to calculate feature importance, and the pros and cons of various selection methods.

Download PDF Here

Principle Components Analysis (PCA)

What principal component analysis is, how it works, the problems you would use PCA for, and the pros and cons associated with PCA.

Download PDF Here

Gradient Boosting

What are Gradient Boosting and XGBoost? How to describe the architecture of gradient boosting. What are the pros and cons associated with them?

Download PDF Here

SQL

Top N Problems

Top N is the most frequently presented SQL question in interviews.

Download PDF Here

Ratio Problems

Two most common ways to compute a ratio, and feature two examples to demonstrate solving the problems.

Download PDF Here

Product Case

Cracking Product Case Problems

Frameworks to crack product case problems in Data Science Interviews.

Download PDF Here