GHIFARY, RIFQI AL (2025) KLASIFIKASI SENTIMEN TWEET TENTANG SICEPAT DENGAN STRATEGI PENYEIMBANGAN DATA. S1 thesis, Universitas Mercu Buana Jakarta.
|
Text (HAL COVER)
01 COVER.pdf Download (440kB) | Preview |
|
![]() |
Text (BAB I)
02 BAB 1.pdf Restricted to Registered users only Download (102kB) |
|
![]() |
Text (BAB II)
03 BAB 2.pdf Restricted to Registered users only Download (211kB) |
|
![]() |
Text (BAB III)
04 BAB 3.pdf Restricted to Registered users only Download (160kB) |
|
![]() |
Text (BAB IV)
05 BAB 4.pdf Restricted to Registered users only Download (576kB) |
|
![]() |
Text (BAB V)
06 BAB 5.pdf Restricted to Registered users only Download (41kB) |
|
![]() |
Text (DAFTAR PUSTAKA)
07 DAFTAR PUSTAKA.pdf Restricted to Registered users only Download (207kB) |
|
![]() |
Text (LAMPIRAN)
08 LAMPIRAN.pdf Restricted to Registered users only Download (1MB) |
Abstract
This study aims to develop an automatic sentiment classification system for Indonesian-language tweets related to SiCepat delivery services. A total of 15,000 tweets were collected using Tweet-Harvest and processed through stages of preprocessing, stemming, and automatic labeling based on a weighted lexicon. The main issue encountered was class imbalance, where positive tweets significantly outnumbered negative ones. To address this, balancing strategies such as SMOTE, undersampling, and a combination of both were applied. Three machine learning algorithms were tested: Support Vector Machine (SVM), Naive Bayes (NB), and Logistic Regression (LR). The models were evaluated using accuracy, precision, recall, F1-score, and confusion matrix. Results show that SVM with SMOTE achieved the best performance (90.5% accuracy and 0.907 F1-score), followed by Logistic Regression with a combined balancing approach (89.2% accuracy). Naive Bayes tended to be biased toward the majority class. Overall, the combined data balancing approach with SVM proved to be the most effective and is recommended for sentiment analysis implementation in the logistics industry. Keywords: Sentiment Analysis, Machine Learning, Class Imbalance, SMOTE, SVM, Logistic Regression, Naive Bayes, Twitter, SiCepat. Penelitian ini bertujuan membangun sistem klasifikasi sentimen otomatis terhadap tweet berbahasa Indonesia yang membahas layanan SiCepat. Sebanyak 15.000 data dikumpulkan melalui Tweet-Harvest dan diproses melalui tahap preprocessing, stemming, dan pelabelan otomatis berbasis lexicon berbobot. Masalah utama adalah ketidakseimbangan kelas, di mana tweet positif jauh lebih banyak daripada negatif. Untuk mengatasinya, digunakan strategi penyeimbangan seperti SMOTE, undersampling, dan kombinasi keduanya. Tiga algoritma pembelajaran mesin yang diuji adalah Support Vector Machine (SVM), Naive Bayes (NB), dan Logistic Regression (LR). Evaluasi dilakukan menggunakan akurasi, precision, recall, F1-score, dan confusion matrix. Hasil menunjukkan SVM dengan SMOTE memiliki kinerja terbaik (akurasi 90,5% dan F1-score 0,907), disusul Logistic Regression kombinasi (akurasi 89,2%). Naive Bayes cenderung bias terhadap kelas mayoritas. Pendekatan kombinasi data dengan SVM terbukti paling efektif dan direkomendasikan untuk implementasi analisis sentimen di industri logistik. Kata kunci: Analisis Sentimen, SMOTE, Logistic Regression, SVM, Naive Bayes, Ketimpangan Kelas, SiCepat, Tweet-Harvest, Media Sosial
Actions (login required)
![]() |
View Item |