PENERAPAN ALGORITMA NAIVE BAYES UNTUK DETEKSI SPAM PADA EMAIL

REIZA, MUHAMMAD (2025) PENERAPAN ALGORITMA NAIVE BAYES UNTUK DETEKSI SPAM PADA EMAIL. S1 thesis, Universitas Mercu Buana Jakarta.

Preview

Text (HAL COVER)
01 COVER.pdf
Download (434kB) | Preview

Text (BAB I)
02 BAB 1.pdf
Restricted to Registered users only
Download (40kB)

Text (BAB II)
03 BAB 2.pdf
Restricted to Registered users only
Download (179kB)

Text (BAB III)
04 BAB 3.pdf
Restricted to Registered users only
Download (71kB)

Text (BAB IV)
05 BAB 4.pdf
Restricted to Registered users only
Download (148kB)

Text (BAB V)
06 BAB 5.pdf
Restricted to Registered users only
Download (40kB)

Text (DAFTAR PUSTAKA)
07 DAFTAR PUSTAKA.pdf
Restricted to Registered users only
Download (155kB)

Text (LAMPIRAN)
08 LAMPIRAN.pdf
Restricted to Registered users only
Download (440kB)

Abstract

Spam email detection has become a crucial issue in the digital age. With the high volume of emails I receive daily, I often encounter unwanted or potentially dangerous messages, such as phishing attempts or malware dissemination. This situation underscores the urgency of effective and reliable spam detection systems. Therefore, I directed this research towards implementing the Naive Bayes Algorithm as a classification method to automatically detect spam in emails. In this research methodology, I began by collecting a labeled email dataset (spam/nonspam). The data then underwent a series of preprocessing and essential feature extraction steps. I applied CountVectorizer to transform email text into a numerical representation in the form of a word frequency matrix, which served as input for the Naive Bayes Algorithm. I trained this model to calculate word probabilities and determine email categories. I performed the training and testing processes by proportionally splitting the dataset. Evaluation results show that the Naive Bayes Algorithm achieved very good performance, evidenced by an accuracy of 0.9856 (98.56%), precision of 0.9567 (95.67%), recall of 0.9412 (94.12%), and an F1- Score of 0.9489 (94.89%). Although I found 5 false positives (legitimate emails misclassified) and 10 false negatives (spam leaked) in the confusion matrix, these figures indicate the model's high effectiveness in identifying spam emails. These findings provide valuable insights for spam filter developers, and I hope they will serve as a reference for further studies to enhance spam detection effectiveness. Keywords: Spam Detection, Email, Naive Bayes Algorithm, Text Classification, Machine Learning. Deteksi spam pada email telah menjadi isu krusial di era digital. Dengan tingginya volume email yang saya terima setiap hari, saya sering berhadapan dengan pesan tidak diinginkan atau berpotensi berbahaya, seperti upaya phishing atau penyebaran malware. Kondisi ini menggarisbawahi urgensi sistem deteksi spam yang efektif dan andal. Oleh karena itu, penelitian ini saya tujukan untuk menerapkan Algoritma Naive Bayes sebagai metode klasifikasi guna otomatis mendeteksi spam dalam email.Dalam metodologi penelitian ini, saya mulai dengan mengumpulkan dataset email terlabel (spam/non-spam). Data kemudian melalui serangkaian proses pra-pemrosesan dan ekstraksi fitur penting. Saya mengaplikasikan CountVectorizer untuk mengubah teks email menjadi representasi numerik berupa matriks frekuensi kata, yang menjadi input bagi Algoritma Naive Bayes. Model ini saya latih untuk menghitung probabilitas kata dan menentukan kategori email. Proses pelatihan dan pengujian saya lakukan dengan membagi dataset secara proporsional.Hasil evaluasi menunjukkan bahwa Algoritma Naive Bayes mencapai kinerja sangat baik, dibuktikan dengan akurasi 0.9856 (98.56%), presisi 0.9567 (95.67%), recall 0.9412 (94.12%), dan F1-Score 0.9489 (94.89%). Meskipun saya menemukan 5 false positives (email sah yang salah klasifikasi) dan 10 false negatives (spam yang lolos) dalam confusion matrix, angka ini menunjukkan efektivitas tinggi model dalam mengidentifikasi email spam. Temuan ini memberikan wawasan berharga bagi pengembang sistem filter spam dan saya harapkan menjadi referensi studi lanjutan guna meningkatkan efektivitas deteksi spam. Kata kunci: Deteksi Spam, Email, Algoritma Naive Bayes, Klasifikasi Teks, Pembelajaran Mesin.

Item Type:	Thesis (S1)
Call Number CD:	FIK/INFO. 25 088
NIM/NIDN Creators:	41520010059
Uncontrolled Keywords:	Deteksi Spam, Email, Algoritma Naive Bayes, Klasifikasi Teks, Pembelajaran Mesin.
Subjects:	000 Computer Science, Information and General Works/Ilmu Komputer, Informasi, dan Karya Umum > 000. Computer Science, Information and General Works/Ilmu Komputer, Informasi, dan Karya Umum > 004 Data Processing, Computer Science/Pemrosesan Data, Ilmu Komputer, Teknik Informatika 000 Computer Science, Information and General Works/Ilmu Komputer, Informasi, dan Karya Umum > 000. Computer Science, Information and General Works/Ilmu Komputer, Informasi, dan Karya Umum > 006 Special Computer Methods/Metode Komputer Tertentu > 006.3 Artificial Intelligence/Kecerdasan Buatan > 006.31 Machine Learning/Pembelajaran Mesin 000 Computer Science, Information and General Works/Ilmu Komputer, Informasi, dan Karya Umum > 020 Library and Information Sciences/Perpustakaan dan Ilmu Informasi > 025 Operations, Archives, Information Centers/Operasional Perpustakaan, Arsip dan Pusat Informasi, Pelayanan dan Pengelolaan Perpustakaan > 025.4 Subject Analysis and Control/Subjek Analisis dan Kontrol Perpustakaan > 025.46 Classification of Specific Subject/Klasifikasi Khusus 200 Religion/Agama > 220 Bible/Al Kitab > 220.1-220.9 Standard Subdivision of Bible/Subdivisi Standar dari Al Kitab > 220.4 Original Text, Early Versions, Early Translations/Teks Asli, Versi dan Terjemahan Paling Awal 500 Natural Science and Mathematics/Ilmu-ilmu Alam dan Matematika > 510 Mathematics/Matematika > 518 Numerical Analysis/Analisis Numerik, Analisa Numerik > 518.1 Algorithms/Algoritma
Divisions:	Fakultas Ilmu Komputer > Informatika
Depositing User:	khalimah
Date Deposited:	02 Aug 2025 01:56
Last Modified:	02 Aug 2025 01:56
URI:	http://repository.mercubuana.ac.id/id/eprint/96452

Actions (login required)

View Item