RAMADHANI, IHSANUL ARIFIN (2026) XAI FOR WINE QUALITY CLASSIFICATION USING SHAPLEY ADDITIVE EXPLANATIONS IN INTERPRET FEATURE IMPORTANCE IN SUPERVISED LEARNING. S1 thesis, Universitas Mercu Buana Jakarta.
|
Text (HAL COVER)
COVER.pdf Download (1MB) | Preview |
|
|
Text (BAB I)
CHAPTER I.pdf Restricted to Registered users only Download (296kB) |
||
|
Text (BAB II)
CHAPTER II.pdf Restricted to Registered users only Download (380kB) |
||
|
Text (BAB III)
CHAPTER III.pdf Restricted to Registered users only Download (776kB) |
||
|
Text (BAB IV)
CHAPTER IV.pdf Restricted to Registered users only Download (770kB) |
||
|
Text (BAB V)
CHAPTER V.pdf Restricted to Registered users only Download (272kB) |
||
|
Text (DAFTAR PUSTAKA)
REFERENCES.pdf Restricted to Registered users only Download (315kB) |
||
|
Text (LAMPIRAN)
APPENDIX.pdf Restricted to Registered users only Download (778kB) |
Abstract
This study investigates explainable supervised learning for red wine quality classification using the UCI Wine Quality (red wine) dataset (1,599 samples; 11 physicochemical features). The ordinal quality score is converted into a binary label (quality ≤ 5: low; quality ≥ 6: high). Using an 80/20 stratified split, Logistic Regression, Random Forest, and XGBoost are compared, with F1-score as the primary metric. XGBoost delivers the best held-out test performance, achieving F1 = 0.8452 at the default 0.50 threshold. A single global decision threshold is then selected from out-offold probabilities via stratified cross-validation on the training set, avoiding test-set tuning. The optimal threshold of 0.45 yields a modest improvement on the test set (F1 = 0.8487) with slightly higher recall. SHAP explanations show that alcohol and sulphates typically raise high-quality likelihood, while volatile acidity and total sulphur dioxide push predictions towards low quality. These findings support accurate yet transparent quality screening. Keywords: Explainable AI, SHAP, XGBoost, supervised learning, wine quality classification, threshold calibration, feature importance. Studi ini meneliti penerapan explainable supervised learning untuk klasifikasi kualitas red wine menggunakan dataset UCI Wine Quality (red wine) (1.599 sampel; 11 fitur fisikokimia). Skor kualitas ordinal dikonversi menjadi label biner (kualitas ≤ 5: rendah; kualitas ≥ 6: tinggi). Dengan stratified split 80/20, Logistic Regression, Random Forest, dan XGBoost dibandingkan, dengan F1-score sebagai metrik utama. XGBoost memberikan kinerja terbaik pada held-out test set, mencapai F1 = 0,8452 pada ambang (threshold) default 0,50. Selanjutnya, satu ambang keputusan global dipilih dari probabilitas out-of-fold melalui stratified cross-validation pada data latih, sehingga menghindari penyesuaian ambang pada data uji. Ambang optimal 0,45 menghasilkan peningkatan kecil pada data uji (F1 = 0,8487) dengan recall yang sedikit lebih tinggi. Penjelasan SHAP menunjukkan bahwa alkohol dan sulphates umumnya meningkatkan peluang kualitas tinggi, sedangkan volatile acidity dan total sulphur dioxide mendorong prediksi ke kualitas rendah. Temuan ini mendukung penyaringan kualitas yang akurat sekaligus transparan. Kata Kunci: Explainable AI, SHAP, XGBoost, supervised learning, wine quality classification, threshold calibration, feature importance.
Actions (login required)
![]() |
View Item |
