KOMPARASI MODEL IMBALANCE DATA BERDASARKAN FITUR TERPILIH MENGGUNAKAN METODE SMOTE-ENN DAN SMOTE-RBO

FAYZA, MUHAMAD ZACKY (2025) KOMPARASI MODEL IMBALANCE DATA BERDASARKAN FITUR TERPILIH MENGGUNAKAN METODE SMOTE-ENN DAN SMOTE-RBO. S1 thesis, Universitas Mercu Buana Jakarta.

[img]
Preview
Text (HAL COVER)
01 COVER.pdf

Download (686kB) | Preview
[img] Text (BAB I)
02 BAB 1.pdf
Restricted to Registered users only

Download (116kB)
[img] Text (BAB II)
03 BAB 2.pdf
Restricted to Registered users only

Download (394kB)
[img] Text (BAB III)
04 BAB 3.pdf
Restricted to Registered users only

Download (241kB)
[img] Text (BAB IV)
05 BAB 4.pdf
Restricted to Registered users only

Download (2MB)
[img] Text (BAB V)
06 BAB 5.pdf
Restricted to Registered users only

Download (42kB)
[img] Text (DAFTAR PUSTAKA)
07 DAFTAR PUSTAKA.pdf
Restricted to Registered users only

Download (169kB)
[img] Text (LAMPIRAN)
08 LAMPIRAN.pdf
Restricted to Registered users only

Download (640kB)

Abstract

This study aims to evaluate the effectiveness of various machine learning algorithms and data balancing techniques in predicting Human Immunodeficiency Virus (HIV)/Acquired Immunodeficiency Syndrome (AIDS). A major challenge in medical data classification is class imbalance, where non-reactive HIV cases dominate reactive ones, leading to biased models and reduced predictive accuracy for the minority class. To address this issue, two hybrid data balancing approaches are implemented: Synthetic Minority Over-sampling Technique–Edited Nearest Neighbor (SMOTE-ENN) and Synthetic Minority Over-sampling Technique– Radial-Based Oversampling (SMOTE-RBO). The dataset used in this study is derived from community surveys, comprising demographic, behavioral, and medical attributes relevant to HIV risk. Eight classification algorithms are applied, namely Decision Tree (DT), Random Forest (RF), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Logistic Regression (LR), Naive Bayes (NB), Convolutional Neural Network (CNN), and Long Short-Term Memory (LSTM). The models are evaluated using accuracy, precision, recall, and F1-score metrics. Results show that SMOTE-ENN consistently outperforms SMOTE-RBO, particularly in terms of F1-score. DT and RF models exhibit the best performance, achieving the highest accuracy and F1-scores, while deep learning models such as CNN and LSTM also demonstrate competitive results. This study highlights the importance of selecting appropriate data balancing strategies and classification algorithms to develop accurate and reliable predictive models for early detection of HIV/AIDS, thereby supporting more effective public health interventions. Kata kunci: Machine Learning, HIV/AIDS Classification, Imbalanced Data, SMOTE-ENN, SMOTE-RBO, Decision Tree, Random Forest. Penelitian ini bertujuan untuk mengevaluasi efektivitas berbagai algoritma machine learning dan teknik penyeimbangan data dalam prediksi Human Immunodeficiency Virus (HIV)/Acquired Immunodeficiency Syndrome (AIDS). Masalah utama dalam klasifikasi data medis adalah ketidakseimbangan kelas, di mana kasus non-reaktif HIV jauh lebih dominan dibandingkan reaktif, sehingga memengaruhi akurasi prediksi model terhadap kelas minoritas. Untuk mengatasi hal ini, dua pendekatan penyeimbangan data hibrida diterapkan: Synthetic Minority Over-sampling Technique–Edited Nearest Neighbor (SMOTE-ENN) dan Synthetic Minority Over-sampling Technique–Radial-Based Oversampling (SMOTE-RBO). Dataset yang digunakan berasal dari survei masyarakat dengan atribut demografis, perilaku, dan medis yang relevan. Delapan algoritma klasifikasi diterapkan, yakni Decision Tree (DT), Random Forest (RF), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Logistic Regression (LR), Naive Bayes (NB), Convolutional Neural Network (CNN), dan Long Short-Term Memory (LSTM). Evaluasi dilakukan menggunakan metrik akurasi, presisi, recall, dan F1- score. Hasil menunjukkan bahwa SMOTE-ENN memberikan peningkatan performa yang lebih konsisten dibandingkan SMOTE-RBO, khususnya pada metrik F1-score. Model DT dan RF menunjukkan kinerja terbaik dengan akurasi dan F1-score tertinggi. Sementara itu, model deep learning seperti CNN dan LSTM juga menunjukkan hasil kompetitif. Penelitian ini menekankan pentingnya pemilihan teknik penyeimbangan data dan algoritma klasifikasi yang tepat dalam membangun sistem prediksi HIV/AIDS yang akurat dan andal, guna mendukung upaya pencegahan yang lebih efektif dalam bidang kesehatan masyarakat. Kata kunci: Machine Learning, Klasifikasi HIV/AIDS, Data Tidak Seimbang, SMOTE-ENN, SMOTE-RBO, Decision Tree, Random Forest

Item Type: Thesis (S1)
Call Number CD: FIK/INFO. 25 125
NIM/NIDN Creators: 41521010081
Uncontrolled Keywords: Machine Learning, Klasifikasi HIV/AIDS, Data Tidak Seimbang, SMOTE-ENN, SMOTE-RBO, Decision Tree, Random Forest
Subjects: 000 Computer Science, Information and General Works/Ilmu Komputer, Informasi, dan Karya Umum > 000. Computer Science, Information and General Works/Ilmu Komputer, Informasi, dan Karya Umum > 004 Data Processing, Computer Science/Pemrosesan Data, Ilmu Komputer, Teknik Informatika
000 Computer Science, Information and General Works/Ilmu Komputer, Informasi, dan Karya Umum > 000. Computer Science, Information and General Works/Ilmu Komputer, Informasi, dan Karya Umum > 006 Special Computer Methods/Metode Komputer Tertentu > 006.3 Artificial Intelligence/Kecerdasan Buatan > 006.31 Machine Learning/Pembelajaran Mesin
500 Natural Science and Mathematics/Ilmu-ilmu Alam dan Matematika > 510 Mathematics/Matematika > 518 Numerical Analysis/Analisis Numerik, Analisa Numerik > 518.1 Algorithms/Algoritma
Divisions: Fakultas Ilmu Komputer > Informatika
Depositing User: khalimah
Date Deposited: 06 Aug 2025 07:08
Last Modified: 06 Aug 2025 07:08
URI: http://repository.mercubuana.ac.id/id/eprint/96610

Actions (login required)

View Item View Item