LILIANDARI, ANNISA RIZKI (2021) TOPIC DISCOVERY AND CLASSIFICATION COMPARISON ON THE COMMENTS OF INDONESIAN ENTERTAINMENT YOUTUBE CHANNEL VIDEOS USING SMOTE, N-GRAM, AND LDA APPROACHES. S1 thesis, Universitas Mercu Buana Jakarta.
Text (HAL COVER)
01 Cover - ANNISA RIZKI LILIANDARI.pdf Download (1MB) |
|
Text (BAB I)
02 Bab 1 - ANNISA RIZKI LILIANDARI.pdf Restricted to Registered users only Download (867kB) |
|
Text (BAB II)
03 Bab 2 - ANNISA RIZKI LILIANDARI.pdf Restricted to Registered users only Download (274kB) |
|
Text (BAB III)
04 Bab 3 - ANNISA RIZKI LILIANDARI.pdf Restricted to Registered users only Download (331kB) |
|
Text (BAB IV)
05 Bab 4 - ANNISA RIZKI LILIANDARI.pdf Restricted to Registered users only Download (172kB) |
|
Text (BAB V)
06 Bab 5 - ANNISA RIZKI LILIANDARI.pdf Restricted to Registered users only Download (981kB) |
|
Text (BAB VI)
07 Bab 6 - ANNISA RIZKI LILIANDARI.pdf Restricted to Registered users only Download (31kB) |
|
Text (DAFTAR PUSTAKA)
08 Daftar Pustaka - ANNISA RIZKI LILIANDARI.pdf Restricted to Registered users only Download (143kB) |
|
Text (LAMPIRAN)
09 Lampiran - ANNISA RIZKI LILIANDARI.pdf Restricted to Registered users only Download (259kB) |
Abstract
YouTube is currently the most popular social media platform, with 88% of active users having easy access to it. Comments containing opinions and suggestions are increasing, and have become challenging to be interpreted individually. This research specifies on the data analysis of text classification and topic modeling of YouTube comments, related to entertainment video contents in Indonesia. This was carried out by applying the data mining classification method, to compare the performance of the Multinomial Naïve Bayes, K-Nearest Neighbor, and Support Vector Machine techniques, and also ascertaining the effect of various experiments, in locating the accurate model for classifying text as positive, negative, or neutral comments. However, the topic modeling process uses Latent Dirichlet Allocation. In conclusion, the complete preprocessing, SMOTE technique application, parameter setting, and N-gram advanced features, contribute to improving accuracy. The results showed that the best level of accuracy, was obtained from a model that applied the SMOTE technique, with a proportion of 80% training data, and 20% testing data. Therefore, the SVM + SMOTE model is superior to the MNB + SMOTE and K-NN + SMOTE techniques, with an accuracy of 97.2% (dataset 1), 96.1% (dataset 2), and 96.3% (dataset 3). The topic modeling shows that two of the three datasets, have the same topic in the content presentation. Key words: YouTube commentary, text classification, topic modeling, machine learning, smote, n-gram.
Actions (login required)
View Item |