FAYZA, MUHAMAD ZACKY (2025) KOMPARASI MODEL IMBALANCE DATA BERDASARKAN FITUR TERPILIH MENGGUNAKAN METODE SMOTE-ENN DAN SMOTE-RBO. S1 thesis, Universitas Mercu Buana Jakarta.
|
Text (HAL COVER)
01 COVER.pdf Download (686kB) | Preview |
|
![]() |
Text (BAB I)
02 BAB 1.pdf Restricted to Registered users only Download (116kB) |
|
![]() |
Text (BAB II)
03 BAB 2.pdf Restricted to Registered users only Download (394kB) |
|
![]() |
Text (BAB III)
04 BAB 3.pdf Restricted to Registered users only Download (241kB) |
|
![]() |
Text (BAB IV)
05 BAB 4.pdf Restricted to Registered users only Download (2MB) |
|
![]() |
Text (BAB V)
06 BAB 5.pdf Restricted to Registered users only Download (42kB) |
|
![]() |
Text (DAFTAR PUSTAKA)
07 DAFTAR PUSTAKA.pdf Restricted to Registered users only Download (169kB) |
|
![]() |
Text (LAMPIRAN)
08 LAMPIRAN.pdf Restricted to Registered users only Download (640kB) |
Abstract
This study aims to evaluate the effectiveness of various machine learning algorithms and data balancing techniques in predicting Human Immunodeficiency Virus (HIV)/Acquired Immunodeficiency Syndrome (AIDS). A major challenge in medical data classification is class imbalance, where non-reactive HIV cases dominate reactive ones, leading to biased models and reduced predictive accuracy for the minority class. To address this issue, two hybrid data balancing approaches are implemented: Synthetic Minority Over-sampling Technique–Edited Nearest Neighbor (SMOTE-ENN) and Synthetic Minority Over-sampling Technique– Radial-Based Oversampling (SMOTE-RBO). The dataset used in this study is derived from community surveys, comprising demographic, behavioral, and medical attributes relevant to HIV risk. Eight classification algorithms are applied, namely Decision Tree (DT), Random Forest (RF), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Logistic Regression (LR), Naive Bayes (NB), Convolutional Neural Network (CNN), and Long Short-Term Memory (LSTM). The models are evaluated using accuracy, precision, recall, and F1-score metrics. Results show that SMOTE-ENN consistently outperforms SMOTE-RBO, particularly in terms of F1-score. DT and RF models exhibit the best performance, achieving the highest accuracy and F1-scores, while deep learning models such as CNN and LSTM also demonstrate competitive results. This study highlights the importance of selecting appropriate data balancing strategies and classification algorithms to develop accurate and reliable predictive models for early detection of HIV/AIDS, thereby supporting more effective public health interventions. Kata kunci: Machine Learning, HIV/AIDS Classification, Imbalanced Data, SMOTE-ENN, SMOTE-RBO, Decision Tree, Random Forest. Penelitian ini bertujuan untuk mengevaluasi efektivitas berbagai algoritma machine learning dan teknik penyeimbangan data dalam prediksi Human Immunodeficiency Virus (HIV)/Acquired Immunodeficiency Syndrome (AIDS). Masalah utama dalam klasifikasi data medis adalah ketidakseimbangan kelas, di mana kasus non-reaktif HIV jauh lebih dominan dibandingkan reaktif, sehingga memengaruhi akurasi prediksi model terhadap kelas minoritas. Untuk mengatasi hal ini, dua pendekatan penyeimbangan data hibrida diterapkan: Synthetic Minority Over-sampling Technique–Edited Nearest Neighbor (SMOTE-ENN) dan Synthetic Minority Over-sampling Technique–Radial-Based Oversampling (SMOTE-RBO). Dataset yang digunakan berasal dari survei masyarakat dengan atribut demografis, perilaku, dan medis yang relevan. Delapan algoritma klasifikasi diterapkan, yakni Decision Tree (DT), Random Forest (RF), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Logistic Regression (LR), Naive Bayes (NB), Convolutional Neural Network (CNN), dan Long Short-Term Memory (LSTM). Evaluasi dilakukan menggunakan metrik akurasi, presisi, recall, dan F1- score. Hasil menunjukkan bahwa SMOTE-ENN memberikan peningkatan performa yang lebih konsisten dibandingkan SMOTE-RBO, khususnya pada metrik F1-score. Model DT dan RF menunjukkan kinerja terbaik dengan akurasi dan F1-score tertinggi. Sementara itu, model deep learning seperti CNN dan LSTM juga menunjukkan hasil kompetitif. Penelitian ini menekankan pentingnya pemilihan teknik penyeimbangan data dan algoritma klasifikasi yang tepat dalam membangun sistem prediksi HIV/AIDS yang akurat dan andal, guna mendukung upaya pencegahan yang lebih efektif dalam bidang kesehatan masyarakat. Kata kunci: Machine Learning, Klasifikasi HIV/AIDS, Data Tidak Seimbang, SMOTE-ENN, SMOTE-RBO, Decision Tree, Random Forest
Actions (login required)
![]() |
View Item |