MAULANA, M. IQBAL (2023) ANALISIS KINERJA SPEECH ENHANCEMENT MODEL DEEPFILTERNET3 PADA KONFERENSI VIDEO (STUDI KASUS: PEMBELAJARAN DARING). S1 thesis, Universitas Mercu Buana Jakarta.
Text (COVER)
01 Cover.pdf Download (296kB) |
|
Text (ABSTRAK)
02 Abstrak.pdf Download (185kB) |
|
Text (BAB I)
03 Bab 1.pdf Restricted to Registered users only Download (218kB) |
|
Text (BAB II)
04 Bab 2.pdf Restricted to Registered users only Download (691kB) |
|
Text (BAB III)
05 Bab 3.pdf Restricted to Registered users only Download (374kB) |
|
Text (BAB IV)
06 Bab 4.pdf Restricted to Registered users only Download (579kB) |
|
Text (BAB V)
07 Bab 5.pdf Restricted to Registered users only Download (156kB) |
|
Text (DAFTAR PUSTAKA)
08 Daftar Pustaka.pdf Restricted to Registered users only Download (179kB) |
|
Text (LAMPIRAN)
09 Lampiran.pdf Restricted to Registered users only Download (1MB) |
Abstract
In the era of online learning and remote work, video conferencing has become important as a communication tool. However, audio quality is often compromised by factors such as background noise and low quality microphones. This study focuses on the Deep Learning-based Speech Enhancement technique with the DeepFilterNet3 model and analyzes its performance in the context of online learning video conferencing. This model uses the Complex Mask (CM) approach to enhance speech by filtering out unwanted noise, and is trained using the Voicebank, Demand, and MIT IR Survey datasets as the Clean Speech, Noise, and RIR datasets. The results showed that the best Self-Trained model was achieved at epoch 115 with a test loss of 1.05138, MultiResSpecLoss of 1.02696, and LocalSnrLoss of 0.02442. Overall, compared to the Pre-Trained and RNNoise models, the DeepFilterNet3-based Pre-Trained models show superior performance in accuracy metrics such as PESQ, CSIG, CBAK, COVL, STOI, SiSDR, and SegSNR. However, the Self-Trained model also shows potential in improving voice quality in video conferencing for online learning. In speed, response time, and RTF metrics, RNNoise has a higher speed with an RTFavg value of 0.001. Both versions of DeepFilterNet3 have an RTFavg of 0.081 and 0.088. Although both versions of the DeepFilterNet3 model have a slower RTF compared to RNNoise, their complexity is still acceptable for certain applications. Keywords: Speech Enhancement, DeepFilterNet3, video conferencing, online learning, accuracy metrics, speed metrics Dalam era pembelajaran daring dan pekerjaan jarak jauh, konferensi video menjadi penting sebagai alat komunikasi. Namun, kualitas audio seringkali terganggu oleh faktor-faktor seperti kebisingan latar belakang dan mikrofon berkualitas rendah. Penelitian ini berfokus pada teknik Speech Enhancement berbasis Deep Learning dengan model DeepFilterNet3 dan menganalisis kinerjanya dalam konteks konferensi video pembelajaran daring. Model ini menggunakan pendekatan Complex Mask (CM) untuk menyempurnakan suara dengan memfilter derau yang tidak diinginkan, dan dilatih menggunakan dataset Voicebank, Demand, dan MIT IR Survey sebagai dataset Clean Speech, Noise, dan RIR. Hasil penelitian menunjukkan bahwa model Self-Trained terbaik, dicapai pada epoch 115 dengan pengujian test loss sebesar 1,05138, MultiResSpecLoss sebesar 1,02696, dan LocalSnrLoss sebesar 0,02442. Secara keseluruhan, dibandingkan dengan model Pre-Trained dan RNNoise, model Pre-Trained berbasis DeepFilterNet3 menunjukkan kinerja unggul dalam metrik akurasi seperti PESQ, CSIG, CBAK, COVL, STOI, SiSDR, dan SegSNR. Namun, model Self-Trained juga menunjukkan potensi dalam meningkatkan kualitas suara dalam konferensi video untuk pembelajaran daring. Dalam metrik kecepatan, waktu respon, dan RTF, RNNoise memiliki kecepatan yang lebih tinggi dengan nilai RTFavg sebesar 0,001. Kedua versi DeepFilterNet3 memiliki RTFavg masing-masing sebesar 0,081 dan 0,088. Meskipun kedua versi model DeepFilterNet3 memiliki RTF yang lebih lambat dibandingkan dengan RNNoise, kompleksitasnya masih dapat diterima untuk aplikasi tertentu. Kata kunci: Speech Enhancement, DeepFilterNet3, konferensi video, pembelajaran daring, metrik akurasi, metrik kecepatan.
Item Type: | Thesis (S1) |
---|---|
Call Number CD: | FT/ELK. 23 103 |
Call Number: | ST/14/23/096 |
NIM/NIDN Creators: | 41421110048 |
Uncontrolled Keywords: | Speech Enhancement, DeepFilterNet3, konferensi video, pembelajaran daring, metrik akurasi, metrik kecepatan. |
Subjects: | 600 Technology/Teknologi > 620 Engineering and Applied Operations/Ilmu Teknik dan operasi Terapan 600 Technology/Teknologi > 620 Engineering and Applied Operations/Ilmu Teknik dan operasi Terapan > 621 Applied Physics/Fisika terapan > 621.3 Electrical Engineering, Lighting, Superconductivity, Magnetic Engineering, Applied Optics, Paraphotic Technology, Electronics Communications Engineering, Computers/Teknik Elektro, Pencahayaan, Superkonduktivitas, Teknik Magnetik, Optik Terapan, Tekn |
Divisions: | Fakultas Teknik > Teknik Elektro |
Depositing User: | Annas Tsabatulloh |
Date Deposited: | 14 Sep 2023 04:13 |
Last Modified: | 14 Sep 2023 04:13 |
URI: | http://repository.mercubuana.ac.id/id/eprint/80654 |
Actions (login required)
View Item |