New machine learning model achieves breakthrough in heart disease prediction with over 95% accuracy

In a recent study published in Scientific Reports, researchers developed a machine learning-based heart disease prediction model (ML-HDPM) that uses various combinations of information and numerous recognized categorization methods.

Study: Comprehensive evaluation and performance analysis of machine learning in heart disease prediction. Image Credit: Summit Art Creations/


Heart disease is a worldwide health risk that healthcare professionals must evaluate and treat with medical examinations, advanced imaging techniques, and diagnostic procedures. Promoting heart-healthy practices and early diagnosis can help minimize cardiovascular disease incidence and enhance overall health.

Current approaches such as machine learning, deep learning, and sensor-based data collection produce promising findings but have limitations such as uneven diagnostic accuracy and overfitting.

The proposed approaches use modern technology and feature selection procedures to enhance heart disease diagnosis and prognosis.

About the study

In the current study, researchers built the ML-HDPM model for accurate cardiac disease prediction.

The researchers used the Cleveland database, the Switzerland database, the Long Beach database, and the Hungary database to obtain cardiovascular data. They pre-processed clinical data followed by feature selection, feature extraction, cluster-based oversampling, and classification.

They used training data to fit the model with the feature set, compute importance scores, and remove the lowest feature scores to achieve the desired feature.

The genetic algorithm (GA) comprised population initialization, selection, crossover, and mutation to determine if the termination criterion was satisfied.

The researchers undersampled raw data samples with majority labels and clustered samples with minority labels to merge the training set and perform synthetic minority over-sampling (SMOTE) to generate model output.

The model selects relevant features using the recursive feature elimination method (RFEM) and the genetic algorithm (GA), which improves the model’s resilience. Techniques such as the under-sampling clustering oversampling technique (USCOM) correct data imbalances.

The classification task uses multiple-layer deep convolutional neural networks (MLDCNN) and the adaptive elephant herd optimization method (AEHOM).

Model classifiers were principal component analysis (PCA), support vector machine (SVM), linear discriminant analysis (LDA), decision tree (DT), random forest (RF), and naïve Bayes (NB).

The model combines supervised infinite feature selection with an upgraded weighted random forest algorithm. The ML-HDPM pre-processing step assures data integrity and model efficacy. Extensive feature selection uncovers important properties for predictive modeling.

A scalar technique achieves a consistent feature effect, while SMOTE corrects for class imbalance. The genetic algorithm employs natural selection principles to generate several solutions in a single generation.

The strategy’s performance is assessed via simulated testing and compared to existing models. The testing, training, and validation datasets comprised 80%, 10%, and 10% data, respectively.


ML-HDPM performed admirably over a wide range of critical evaluation criteria, as evidenced by the comprehensive examination. Using training data, the ML-HDPM model predicted cardiovascular disease with 96% accuracy and 95% precision.

The system’s sensitivity (recall) yielded 96% accuracy, while F-scores of 92% reflected its balanced performance. The ML-HDPM specificity of 90% is noteworthy.

ML-HDPM provides accurate and reliable results. It incorporates complex technologies such as feature selection, data balance, deep learning, and adaptive elephant herding optimization (AEHOM). These strategies allow the model to reliably forecast cardiac disease, which improves clinical decisions and patient outcomes.

ML-HDPM outperforms other algorithms in training (95%) and testing (88%). The success is due to the combination of complex feature extraction, data imbalance corrections, and machine learning.

Feature selection algorithms enable finding significant qualities associated with cardiovascular health, allowing them to detect subtle patterns indicative of cardiovascular disease.

Data correction using efficient data balancing techniques guarantees model training on representative datasets, including deep learning using the MLDCNN approach and AEHOM optimization to improve model accuracy.

ML-HDPM, a deep learning model, has lower false-positive rates (FPR) in training (8.20%) and testing (15%) than other approaches due to feature selections, data balance, and improved machine learning components in ML-HDPM.

The model had high true-positive rates (TPR) in the training (96%) and testing (91%) datasets due to feature identification, data balance, and deep-learning improvements. The approach improves the model’s capacity to identify true positives.


The study presents a unique ML-HDPM approach that incorporates feature selections, data balance, and machine learning to improve cardiovascular disease prediction.

The balanced F-values for accuracy and recall, high accuracy and precision rates, and low false-positive rates in the training and testing datasets highlight the promising potential of the model in cardiovascular diagnostic applications.

The findings indicate that the ML-HDPM model can increase the precision and speed of identifying cardiovascular diseases, thus improving the standard of care.

However, further investigation is required to improve model optimization and data quality and investigate its use by healthcare professionals in real-world settings.