Machine Learning Zoomcamp Update: Thursday, 21 October 2021

Date: 21 October 2021

Today, I started working on the midterm project.

First, I searched for dataset on Kaggle and the UCI repository.

I found a dataset with placement records of MBA students . I trained a simple logistic regression model to check the dataset such that the model predicts the 'status' (whether the student will get placed or not) and I got 100% accuracy. I was sure that it is not possible and I have made a mistake. After analysis the dataset I found a feature 'salary' which is NaN when the 'status' is 'Not Placed' and has a value otherwise. Therefore, I found the source of my data leakage. :)

Now, I decided to train two models, one classification to predict the 'status' and one regression to predict 'salary'. Alexey told me that I should first focus on only classification and if I have time I can do the regression part also. So, I will be following his advice and will try to complete the classification model soon.

Estimated Time Taken: 1 hour 30 minutes