The Influence of Using Up Sampling Method in Predicting Patients' Disease Types using a Combination of Natural Language Processing, Naive Bayes Algorithm, XGBoost and Support Vector Machine

PRATAMA, MHD AVICENNA WEKA RIZKY (2021) The Influence of Using Up Sampling Method in Predicting Patients' Disease Types using a Combination of Natural Language Processing, Naive Bayes Algorithm, XGBoost and Support Vector Machine. S2 thesis, Universitas Mercu Buana Jakarta.

[img]
Preview
Text (HAL COVER)
01 Cover.pdf

Download (1MB) | Preview
[img] Text (BAB I)
02 Bab 1 Literatur Review.pdf
Restricted to Registered users only

Download (73kB)
[img] Text (BAB II)
03 Bab 2 Analisis dan Design.pdf
Restricted to Registered users only

Download (176kB)
[img] Text (BAB III)
04 Bab 3 Source Code.pdf
Restricted to Registered users only

Download (219kB)
[img] Text (BAB IV)
05 Bab 4 Dataset.pdf
Restricted to Registered users only

Download (153kB)
[img] Text (BAB V)
06 Bab 5 Tahapan penelitian.pdf
Restricted to Registered users only

Download (127kB)
[img] Text (BAB VI)
07 Bab 6 Hasil Penelitian.pdf
Restricted to Registered users only

Download (139kB)
[img] Text (DAFTAR PUSTAKA)
08 Bab 7 Refrensi.pdf
Restricted to Registered users only

Download (182kB)
[img] Text (LAMPIRAN)
Lampiran.pdf
Restricted to Registered users only

Download (305kB)

Abstract

The hospital is a very important place for health care providers. To ensure that the services provided are considered to be maximal, applying technology that can help these services is urgently needed. Technology that can be used is such as processing hospital activity data optimally, because there are still a few hospitals that use the data they have besides being used as activity history, further data management is needed. Activity data that can be utilized, such as patient examination data, drug data, disease data, and data related to patient handling. In this activity data can be used as a reference for medical personnel in making decisions to provide the best service, especially for doctors to avoid medical errors. In this study, we want to show that the application of data mining by combining Natural Language Processing, Naïve Bayes Algorithm, XGBoost, and Support Vector Machine (SVM) to test data to be processed in the classification of disease types based on existing datasets and also wants to know how much influence it has. from an unbalanced dataset (Unbalance) and a data set that is balanced (Balance). The use of XGBoost and SVM was chosen because they have good computational capabilities, are efficient in data processing time and are quite high in accuracy. By using the three algorithms, the classification results of the type of disease are obtained with an average accuracy of each algorithm of 69.22% with the Naïve Bayes algorithm, 92.13% with XGBoost and 88.49% with SVM. In this study also obtained the effect of an unbalanced and balanced dataset with an average effectiveness of using the Up sampling method on this unbalanced data of 2.72%. Key words: Data mining, Natural Language Processing, Word2Vec, Naїve Bayes, XGBoost, Support Vector Machine, Unbalanced Rumah sakit adalah tempat penyedia pelayanan kesehatan yang sangat penting. Untuk memastikan pelayanan yang diberikan dirasa maksimal, menerapkan teknologi yang dapat membantu pelayanan ini sangat dibutuhkan. Teknologi yang dapat digunakan seperti pengolahan data-data kegiatan rumah sakit secara maksimal, dikarenakan masih sedikit rumah sakit yang memanfaatkan data-data yang mereka miliki selain dijadikan histori kegiatan maka dibutuhkan pengelolaan data yang lebih lanjut. Data-data kegiatan yang dapat dimanfaatkan seperti data pemeriksaan pasien, data obat-obatan, data penyakit, hingga data yang berhubungan dengan penanganan pasien. Dalam data-data kegiatan ini bisa dimanfaatkan sebagai salah satu acuan tenaga medis dalam mengambil keputusan untuk memberikan pelayanan terbaik terlebih khusus dokter agar terhindar dari adanya Medical Error. Dalam penelitian ini ingin menunjukkan bahwa pengaplikasian Data mining dengan mengkombinasikan Natural Language Processing, Algoritma Naïve Bayes, XGBoost, dan Support Vector Machine (SVM) untuk menguji data yang akan diolah dalam klasifikasi jenis penyakit berdasarkan dataset yang telah ada dan juga ingin mengetahui seberapa besar pengaruh dari dataset yang tidak seimbang (Unbalance) dan data set yang seimbang (Balance). Penggunaan XGBoost dan SVM dipilih karena memiliki kemampuan komputasi yang baik,efisien dalam waktu pengolahan data dan Akurasi yang cukup tinggi .Dengan menggunakan ketiga algoritma didapatkan hasil klasifikasi jenis penyakit dengan akurasi rata-rata setiap algoritma sebesar 69.22% dengan algoritma Naïve Bayes, 92.13% dengan XGBoost dan 88.49% dengan SVM. Dalam penelitian ini juga diperoleh pengaruh dari dataset yang tidak seimbang dan seimbang dengan rata-rata keefektifan penggunaan metode Up sampling pada data unbalance ini sebesar 2.72%. Kata kunci: Data mining, Natural Language Processing, Word2Vec, Naїve Bayes, XGBoost, Support Vector Machine, Unbalanced

Item Type: Thesis (S2)
NIM/NIDN Creators: 41517010028
Uncontrolled Keywords: Data mining, Natural Language Processing, Word2Vec, Naїve Bayes, XGBoost, Support Vector Machine, Unbalanced
Subjects: 000 Computer Science, Information and General Works/Ilmu Komputer, Informasi, dan Karya Umum > 000. Computer Science, Information and General Works/Ilmu Komputer, Informasi, dan Karya Umum
000 Computer Science, Information and General Works/Ilmu Komputer, Informasi, dan Karya Umum > 000. Computer Science, Information and General Works/Ilmu Komputer, Informasi, dan Karya Umum > 004 Data Processing, Computer Science/Pemrosesan Data, Ilmu Komputer, Teknik Informatika
Divisions: Fakultas Ilmu Komputer > Informatika
Depositing User: Dede Muksin Lubis
Date Deposited: 26 Oct 2023 02:59
Last Modified: 26 Oct 2023 02:59
URI: http://repository.mercubuana.ac.id/id/eprint/83327

Actions (login required)

View Item View Item