SEMBIRING, RIAS AKMAL (2026) KOMPARASI MODEL IMBALANCED DATA BERDASARKAN FITUR TERPILIH MENGGUNAKAN METODE RN-SMOTE DAN DBSCAN-SMOTE. S1 thesis, Universitas Mercu Buana Jakarta.
|
Text (HAL COVER)
COVER.pdf Download (491kB) | Preview |
|
|
Text (BAB I)
BAB 1.pdf Restricted to Registered users only Download (107kB) |
||
|
Text (BAB II)
BAB 2.pdf Restricted to Registered users only Download (337kB) |
||
|
Text (BAB III)
BAB 3.pdf Restricted to Registered users only Download (172kB) |
||
|
Text (BAB IV)
BAB 4.pdf Restricted to Registered users only Download (1MB) |
||
|
Text (BAB V)
BAB 5.pdf Restricted to Registered users only Download (36kB) |
||
|
Text (DAFTAR PUSTAKA)
DAFTAR PUSTAKA.pdf Restricted to Registered users only Download (95kB) |
||
|
Text (LAMPIRAN)
LAMPIRAN.pdf Restricted to Registered users only Download (664kB) |
Abstract
This study compares the effectiveness of the RN-SMOTE and DBSCAN-SMOTE data balancing methods in addressing the imbalanced-data problem in HIV status prediction using selected features. The dataset represents sociodemographic, biological, and behavioral attributes, with the target variable rujukan_hasil_hiv classified into two classes (reactive and non-reactive). Following the preprocessing and data balancing stages, the study evaluates eight classification algorithms: Random Forest (RF), Support Vector Machine (SVM), Decision Tree (DT), Logistic Regression (LR), KNearest Neighbors (KNN), Naive Bayes (NB), Convolutional Neural Network (CNN), and Long Short-Term Memory (LSTM). Model performance is assessed using accuracy, precision, recall, and F1-score to provide a comprehensive overview of classification performance under imbalanced class distributions. The results indicate that RN-SMOTE generally yields more consistent performance improvements than DBSCAN-SMOTE across most algorithms, particularly for ensemble- and tree-based models. In the combined evaluation (combined) of attributes from related studies, the RN-SMOTE and Random Forest combination achieved the best performance, with an accuracy of 0.7673 and an F1-score of 0.7657, making it the most recommended approach for developing an HIV status prediction model. These findings suggest that RN-SMOTE’s more targeted generation of synthetic samples near the decision boundary enhances the model’s ability to perform more balanced classification. Accordingly, this study provides a recommended predictive model that may serve as a basis for developing decision-support systems to strengthen HIV/AIDS surveillance and intervention programs in Indonesia. Keywords: HIV/AIDS, Imbalanced Data, RN-SMOTE, DBSCAN-SMOTE, Classification, Random Forest Penelitian ini membandingkan efektivitas metode penyeimbangan data RN-SMOTE dan DBSCAN-SMOTE dalam mengatasi permasalahan imbalanced data pada prediksi status HIV berbasis fitur terpilih. Dataset yang digunakan merepresentasikan atribut sosiodemografis, biologis, dan perilaku, dengan variabel target rujukan_hasil_hiv yang diklasifikasikan ke dalam dua kelas (reaktif dan non-reaktif). Setelah proses preprocessing dan penyeimbangan data, penelitian menguji delapan algoritma klasifikasi, yaitu Random Forest (RF), Support Vector Machine (SVM), Decision Tree (DT), Logistic Regression (LR), K-Nearest Neighbors (KNN), Naive Bayes (NB), Convolutional Neural Network (CNN), dan Long Short-Term Memory (LSTM). Kinerja model dievaluasi menggunakan metrik akurasi, presisi, recall, dan F1-score untuk memperoleh gambaran komprehensif mengenai performa klasifikasi pada kondisi distribusi kelas yang tidak seimbang. Hasil penelitian menunjukkan bahwa RNSMOTE secara umum memberikan peningkatan performa yang lebih konsisten dibandingkan DBSCAN-SMOTE pada sebagian besar algoritma, terutama pada model berbasis ensemble dan pohon keputusan. Pada pengujian gabungan (combined) dari atribut penelitian-penelitian terkait, kombinasi RN-SMOTE dengan Random Forest menghasilkan performa terbaik dengan akurasi 0,7673 dan F1-score 0,7657, sehingga menjadi kombinasi yang paling direkomendasikan untuk pengembangan model prediksi status HIV. Temuan ini mengindikasikan bahwa pembangkitan sampel sintetis yang lebih terarah pada area batas keputusan melalui RN-SMOTE mampu meningkatkan kemampuan model dalam mengklasifikasikan data secara lebih seimbang. Dengan demikian, penelitian ini menghasilkan rekomendasi model prediktif yang dapat dijadikan dasar pengembangan sistem pendukung keputusan untuk mendukung program surveilans dan intervensi HIV/AIDS di Indonesia. Kata kunci: HIV/AIDS, Imbalance Data, RN-SMOTE, DBSCAN-SMOTE, Klasifikasi, Random Forest
| Item Type: | Thesis (S1) |
|---|---|
| NIM/NIDN Creators: | 41523010227 |
| Uncontrolled Keywords: | HIV/AIDS, Imbalance Data, RN-SMOTE, DBSCAN-SMOTE, Klasifikasi, Random Forest |
| Subjects: | 000 Computer Science, Information and General Works/Ilmu Komputer, Informasi, dan Karya Umum > 000. Computer Science, Information and General Works/Ilmu Komputer, Informasi, dan Karya Umum > 004 Data Processing, Computer Science/Pemrosesan Data, Ilmu Komputer, Teknik Informatika 000 Computer Science, Information and General Works/Ilmu Komputer, Informasi, dan Karya Umum > 000. Computer Science, Information and General Works/Ilmu Komputer, Informasi, dan Karya Umum > 006 Special Computer Methods/Metode Komputer Tertentu > 006.3 Artificial Intelligence/Kecerdasan Buatan > 006.31 Machine Learning/Pembelajaran Mesin |
| Divisions: | Fakultas Ilmu Komputer > Informatika |
| Depositing User: | khalimah |
| Date Deposited: | 28 Mar 2026 03:58 |
| Last Modified: | 28 Mar 2026 03:58 |
| URI: | http://repository.mercubuana.ac.id/id/eprint/101745 |
Actions (login required)
![]() |
View Item |
