ML for Early Heart Failure Detection

Role: ML Engineer | Tools: Python (NumPy, Pandas, Matplotlib), scikit-learn (for metrics/plots only) | Data: Kaggle (Heart Failure Prediction)

Feature importance from Logistic Regression model, highlighting ECG slope, chest pain type, sex, and exercise-induced angina as top predictors.

Heatmap of kNN accuracy by k-value and train/test split: highest accuracy (89.86%) achieved at k=7 with a 70/30 split.

Quick Links: GitHub | Report (PDF)

Executive Summary

Collaborated with a partner to build and compare four supervised learning models (k-Nearest Neighbors, Logistic Regression, Decision Tree, and Random Forest) to predict heart failure risk from routine patient attributes. I implemented kNN and Logistic Regression from scratch, while my partner implemented the tree-based models. We translated our findings into actionable recommendations for early-risk identification and care prioritization.

Business Problem

Physicians, payers, and patients need reliable, interpretable signals to identify high-risk patients early. We asked:

Which model predicts risk most accurately?
Which features most strongly indicate risk?
How much better are ML models than a simple heuristic baseline?

Data

Observations: 918 patients
Features (11): age, sex, chest pain type, resting BP, cholesterol, fasting blood sugar, resting ECG, max heart rate, exercise-induced angina, ST depression (“oldpeak”), and ECG slope.
Target: heart failure event (binary)

Approach

My contributions:
- Implemented from scratch: kNN (Euclidean distance; k ∈ {3,5,7}); Logistic Regression (sigmoid, gradient descent, ridge regularization with C ∈ {0.01, 0.1, 1})
- Heuristic baseline creation and evaluation
- Feature weight inspection for interpretability
Partner’s contributions: Decision Tree and Random Forest implementation for comparison
Evaluation metrics: Accuracy, Precision/Recall/F1, ROC AUC

Results

Best performing model: kNN, 70/30 split, 89.86% accuracy.

kNN: Accuracy 89.86%, Precision 93%, Recall 90%, F1 91%
Logistic Regression: Accuracy 88.04%, ROC AUC 0.945, Precision 92%, Recall 88%, F1 90%
Heuristic baseline: Accuracy 56.97%, so performs significantly worse
Tree models: Random Forest outperformed Decision Tree as expected, but trailed kNN overall.

Insights & Recommendations

Top signals: ECG slope, chest pain type, sex, and exercise-induced angina were the strongest predictors.
Clinical takeaway: Use the model to flag high-risk patients for additional screening and resource prioritization.
Operational: Start with an interpretable model (Logistic Regression) to build clinician trust, then augment with kNN for higher raw accuracy

Impact

Demonstrated that ML materially outperforms simple rules on this problem (≈33 accuracy points above heuristic).
Provided a framework to balance accuracy vs. interpretability—key for clinical adoption and stakeholder buy-in.

ML for Early Heart Failure Detection

Alicia Maranville