Machine Learning Pipeline & Metrics
1. Training vs. Testing
Training Phase
- Goal: Minimize training error while avoiding overfitting.
- Model learns patterns from the data.
- Use training loss to guide optimization.
Testing (or Validation) Phase
- Goal: Measure generalization error.
- Evaluate on unseen data to check performance.
- Metrics here tell if the model is truly good.
2. Types of Regression
Linear Regression
- Predicts continuous values.
- Example: Predicting house prices.
- Multiple Linear Regression → multiple predictors.
Logistic Regression
- Classification task, despite the name.
- Predicts probabilities for categories.
- Example: Spam vs. Not Spam.
3. Common Metrics
For Regression (Continuous Target)
- R² → Higher is better.
- MSE / RMSE / RSS → Lower is better.
- p-values (for statistical significance).
For Classification (Including Logistic Regression)
- Accuracy → Higher is better.
- Precision, Recall, F1-score.
- ROC-AUC.