WARDOYO, SHABRIO CAHYO (2025) ANALISIS OPTIMASI HYPERPARAMETER MULTINOMIALNB DAN LOGISTIC REGRESSION DALAM KLASIFIKASI GENRE FILM BERBASIS TEKS MULTI FITUR. S1 thesis, Universitas Mercu Buana Jakarta.
|
Text (HAL COVER)
01 COVER.pdf Download (414kB) | Preview |
|
![]() |
Text (BAB I)
02 BAB 1.pdf Restricted to Registered users only Download (118kB) |
|
![]() |
Text (BAB II)
03 BAB 2.pdf Restricted to Registered users only Download (166kB) |
|
![]() |
Text (BAB III)
04 BAB 3.pdf Restricted to Registered users only Download (219kB) |
|
![]() |
Text (BAB IV)
05 BAB 4.pdf Restricted to Registered users only Download (330kB) |
|
![]() |
Text (BAB V)
06 BAB 5.pdf Restricted to Registered users only Download (111kB) |
|
![]() |
Text (DAFTAR PUSTAKA)
07 DAFTAR PUSTAKA.pdf Restricted to Registered users only Download (124kB) |
|
![]() |
Text (LAMPIRAN)
08 LAMPIRAN.pdf Restricted to Registered users only Download (4MB) |
Abstract
This research aims to analyze and compare the performance of two text classification algorithms—Multinomial Naive Bayes (MNB) and Logistic Regression (LR)—for film genre classification using multi-feature text data, both with and without hyperparameter optimization. Film genres play a crucial role in digital content recommendation systems; however, manual classification tends to be subjective and time-consuming. The dataset, obtained from Letterboxd via Kaggle, includes film titles, descriptions, and themes. After preprocessing and text normalization (tokenization, lemmatization, and stemming), the text data were transformed into numerical features using the TF-IDF method. Two modeling scenarios were applied: the first using default parameters, and the second employing GridSearchCV to find the optimal hyperparameter settings. Model performance was evaluated using accuracy, precision, recall, and F1-score. The results indicate that the optimized LR model achieved the highest accuracy of 0.847, followed by the optimized MNB model with an accuracy of 0.837. This study concludes that hyperparameter optimization significantly improves model performance and that LR outperforms MNB in the context of multi-feature text-based genre classification. Keywords : Multinomial Naive Bayes, Logistic Regression, TF-IDF, Hyperparameter Optimization, Film Genre Classification. Penelitian ini bertujuan untuk menganalisis dan membandingkan kinerja dua algoritma klasifikasi teks—Multinomial Naive Bayes (MNB) dan Logistic Regression (LR)—dalam klasifikasi genre film berbasis teks multi-fitur, dengan dan tanpa optimasi hyperparameter. Genre film dipilih karena memiliki peran penting dalam sistem rekomendasi konten digital, namun klasifikasi manual bersifat subjektif dan memakan waktu. Data penelitian diperoleh melalui scraping dari platform Letterboxd dan diakses melalui Kaggle, terdiri dari judul, deskripsi, dan tema film. Setelah melalui proses pembersihan dan normalisasi teks (tokenisasi, lemmatization, dan stemming), fitur teks diekstraksi menggunakan metode TF-IDF. Penelitian ini menerapkan dua skenario pemodelan: pertama tanpa optimasi hyperparameter, dan kedua menggunakan GridSearchCV untuk menemukan kombinasi parameter terbaik. Evaluasi dilakukan menggunakan metrik akurasi, presisi, recall, dan F1-score. Hasil menunjukkan bahwa LR dengan optimasi menghasilkan akurasi tertinggi sebesar 0.847, diikuti oleh MNB dengan optimasi sebesar 0.837. Penelitian ini menyimpulkan bahwa optimasi hyperparameter secara signifikan meningkatkan performa klasifikasi, dan LR lebih unggul dibandingkan MNB dalam skenario ini. Kata Kunci : Multinomial Naive Bayes, Logistic Regression, TF-IDF, Optimasi Hyperparameter, Klasifikasi Genre Film.
Actions (login required)
![]() |
View Item |