PERINGKASAN TEKS OTOMATIS ABSTRAK BAHASA INDONESIA MENGGUNAKAN MODEL PEGASUS

SUDRAJAT, MOHAMAD IMAN SOLIHIN (2024) PERINGKASAN TEKS OTOMATIS ABSTRAK BAHASA INDONESIA MENGGUNAKAN MODEL PEGASUS. S1 thesis, Universitas Mercu Buana Jakarta.

[img] Text (HAL COVER)
01 COVER.pdf

Download (491kB)
[img] Text (ABSTRAK)
02 ABSTRAK.pdf

Download (221kB)
[img] Text (BAB I)
03 BAB 1.pdf
Restricted to Registered users only

Download (220kB)
[img] Text (BAB II)
04 BAB 2.pdf
Restricted to Registered users only

Download (379kB)
[img] Text (BAB III)
05 BAB 3.pdf
Restricted to Registered users only

Download (323kB)
[img] Text (BAB IV)
06 BAB 4.pdf
Restricted to Registered users only

Download (695kB)
[img] Text (BAB V)
07 BAB 5.pdf
Restricted to Registered users only

Download (179kB)
[img] Text (DAFTAR PUSTAKA)
08 DAFTAR PUSTAKA.pdf
Restricted to Registered users only

Download (206kB)
[img] Text (LAMPIRAN)
09 LAMPIRAN.pdf
Restricted to Registered users only

Download (1MB)

Abstract

Large text documents are difficult to understand and take time to extract important information. One way to quickly summarize text is with abstract automatic text summarization. This research uses the indosum dataset which contains a collection of news texts. With the data used 2000 samples with document size ranging from 1 paragraph - 22 paragraphs. The algorithm model used is PEGASUS tunner007/pegasus_summarize provided by the huggingface library. The scenarios performed are the use of pre-processing and split data. Scenario 1 implement without stemming and stopwords removal with 70:30 split data, then scenario 2 implement stemming without stopwords with 80:20 split data and scenario 3 implement stemming and stopwords removal with 90:10 dataset. The results show that scenario 1 gives the best results with ROUGE-1 precision 0.581760, recall 0627699, f1-score 0.602227. then ROUGE-2 precision 0.461695, recall 0.498631, f1-score 0.478117 and ROUGEL-L precision 0.545045, recall 0.588313, f1-score 0.564325. Keywords: Automatic Text Summarization, Abstract, Bahasa Indonesia, PEGASUS, ROUGE Dokumen teks besar sulit bisa dipahami dan membutuhkan waktu untuk mengekstrak informasi penting. Salah satu cara mendapatkan ringkasan teks dengan cepat yaitu dengan peringkasan teks otomatis abstrak. Penelitian ini menggunakan dataset indosum yang berisi kumpulan teks berita. Dengan data yang digunakan 2000 sampel dengan ukuran size dokumen yang berkisar 1 paragraf – 22 paragraf. Model algoritma yang digunakan adalah PEGASUS tunner007/pegasus_summarize yang disediakan oleh library huggingface. Skenario yang dilakukan adalah penggunaan pre-processing dan split data. Skenario 1 implementasikan tanpa stemming dan stopwords removal dengan split data 70:30, lalu skenario 2 implementasikan stemming tanpa stopwords dengan split data 80:20 dan skenario 3 implementasikan stemming sama stopwords removal dengan dataset 90:10. Dari hasil penelitian menunjukkan bahwa skenario 1 memberikan hasil terbaik dengan nilai matriks ROUGE-1 precision 0.581760, recall 0.627699, f1-score 0.602227. lalu ROUGE-2 precision 0.461695, recall 0.498631, f1-score 0.478117 dan ROUGEL-L precision 0.545045, recall 0.588313, f1-score 0.564325. Kata Kunci : Peringkasan Teks Otomatis, Abstrak, Bahasa Indonesia, PEGASUS, ROUGE

Item Type: Thesis (S1)
Call Number CD: FIK/INFO. 24 062
Call Number: SIK/15/24/050
NIM/NIDN Creators: 41519110007
Uncontrolled Keywords: Peringkasan Teks Otomatis, Abstrak, Bahasa Indonesia, PEGASUS, ROUGE
Subjects: 000 Computer Science, Information and General Works/Ilmu Komputer, Informasi, dan Karya Umum > 020 Library and Information Sciences/Perpustakaan dan Ilmu Informasi > 025 Operations, Archives, Information Centers/Operasional Perpustakaan, Arsip dan Pusat Informasi, Pelayanan dan Pengelolaan Perpustakaan > 025.4 Subject Analysis and Control/Subjek Analisis dan Kontrol Perpustakaan > 025.41 Abstracting/Abstrak
600 Technology/Teknologi > 620 Engineering and Applied Operations/Ilmu Teknik dan operasi Terapan > 629 Other Branches of Engineering/Cabang Teknik Lainnya > 629.8 Automatic Control Engineering/Teknik Kontrol Otomatis
700 Arts/Seni, Seni Rupa, Kesenian > 780 Music/Seni Musik > 780.1-780.9 Standard Subdivisions of Music/Subdivisi Standar Dari Seni Musik > 780.2 Miscellany of Music/Aneka Ragam tentang Seni Musik > 780.26 Treaties on Music Scores, Recordings, Texts/Perjanjian pada Skor Musik, Rekaman, Teks
Divisions: Fakultas Ilmu Komputer > Informatika
Depositing User: khalimah
Date Deposited: 14 Mar 2024 08:07
Last Modified: 16 Mar 2024 04:47
URI: http://repository.mercubuana.ac.id/id/eprint/87137

Actions (login required)

View Item View Item