Phase 328 DaysIntermediate
Phase 3 β Core Machine Learning
Train, evaluate, and explain classical ML models on real tabular datasets β building pipelines that survive contact with messy, real-world data.
- Build end-to-end pipelines from raw CSV to deployed predictions.
- Choose models based on data geometry, interpretability needs, and error costs.
- Use SHAP and proper evaluation metrics to make predictions explainable.
β‘ Must Know
- Supervised vs Unsupervised vs RL
- Linear Regression β OLS, MSE, RΒ²
- Logistic Regression β sigmoid, log-loss
- Decision Trees β Gini, entropy, depth
- Random Forests β bagging, feature importance
- XGBoost + LightGBM β gradient boosting
- K-Means Clustering β centroids, elbow method
- Train/Val/Test Split + Cross-Validation
- Precision, Recall, F1, ROC-AUC
- Feature Scaling β StandardScaler, MinMaxScaler
- Encoding Categorical Features
- Feature Engineering + Selection
- Regularization β L1/L2, ElasticNet
- Hyperparameter Tuning β GridSearchCV
- sklearn Pipelines
- SHAP for Model Interpretability
β¨ Good to Know
- SVM + Kernel Trick
- Naive Bayes
- Hierarchical Clustering
- Handling Imbalanced Data β SMOTE
- Collaborative Filtering β recommenders
- Joblib / Pickle β model persistence
π Resources
scikit-learn User Guide
The authoritative reference for every classical ML algorithm.
scikit-learn.org βHands-On ML (GΓ©ron)
The most practical ML book β covers theory and sklearn end-to-end.
oreilly.com βποΈ Projects
House Price Prediction
Tabular regression pipeline with feature engineering and cross-validated tuning.
Customer Churn Classifier
Predict churn risk with SHAP-driven feature explanations for retention.
Movie Recommender
Collaborative filtering recommender evaluated on ranking quality.