Machine Learning Pipeline & Metrics

1. Training vs. Testing

Training Phase

  • Goal: Minimize training error while avoiding overfitting.
  • Model learns patterns from the data.
  • Use training loss to guide optimization.

Testing (or Validation) Phase

  • Goal: Measure generalization error.
  • Evaluate on unseen data to check performance.
  • Metrics here tell if the model is truly good.

2. Types of Regression

Linear Regression

  • Predicts continuous values.
  • Example: Predicting house prices.
  • Multiple Linear Regression → multiple predictors.

Logistic Regression

  • Classification task, despite the name.
  • Predicts probabilities for categories.
  • Example: Spam vs. Not Spam.

3. Common Metrics

For Regression (Continuous Target)

  • R² → Higher is better.
  • MSE / RMSE / RSS → Lower is better.
  • p-values (for statistical significance).

For Classification (Including Logistic Regression)

  • Accuracy → Higher is better.
  • Precision, Recall, F1-score.
  • ROC-AUC.