Machine Learning Zoomcamp Update: Tuesday, 14 September 2021
Date: 14 September 2021
Today, I completed the Sessions 2.1, 2.2, 2.3, 2.4 and 2.5.
Session 2.1  Car price prediction project
In this session Alexey Grigorev introduced this week's lessions and the project we will be working on this week.
Key takeaways:
 We will learn regression by working on a project using Car Features and MSRP dataset on Kaggle.

The Project plan is as follows:
 Prepare data and do EDA (Exploratory Data Analysis)
 Use linear regression for predicting price
 Understanding the internals of linear regression
 Evaluating the model using RMSE
 Feature engineering
 Regularization
 Using the model
Session 2.2  Data preparation
This session explains how to perform data cleaning and prepare data for training.
Key takeaways:
 Data should be consistent.
Session 2.3  Exploratory data analysis
This session explains Exploratory data analysis.
Key takeaways:
 Before training we should try to find any patterns if there are in the data.
 Using np.log1p method we can remove the tail if there is in data distribution.
Session 2.4  Setting up the validation framework
This session explains train, validation and test split.
Key takeaways:
 The data should be first shuffled before creating the splits.
 We should set the random seed to make reproducible results.
Session 2.5  Linear regression
This session gave an introduction to linear regression.
Key takeaways:
 We can use the np.expm1 to convert model result to actual predictions if we have used np.log1p previously.
 Prediction in linear regression: g(x_{i}) = w_{0} + ∑_{j=1}^{n} w_{j}x_{ij}, where there are n features
Estimated Time Taken: 1 hour 10 minutes