Heart Disease Classification

This project focused on developing a logistic regression model to identify key predictors of a binary outcome. The dataset was first cleaned by removing missing values (NAs) to ensure data quality. Numerical variables were then recoded into categorical names suitable for modeling, such as coding gender as 1 for female and 0 for male. Logistic regression was applied, and three variable selection methods—stepwise, forward, and backward—were used to determine the most significant predictors. The stepwise approach combined both forward and backward procedures to optimize model simplicity and predictive power. Once the final model was established, predictions were generated and converted into binary outcomes using a 0.5 probability threshold. A confusion matrix was created to compare predicted and actual classifications, allowing for the evaluation of model accuracy, sensitivity, specificity, and precision. Overall, the logistic regression analysis effectively identified influential variables, demonstrating a systematic approach to data preprocessing, model selection, and performance assessment. (This project is inspired from DataCamp Project)

📄 View Full PDF