Explainable Machine Learning Using VOC Profiles for Sausage Spoilage Prediction Enhanced by GAN-Augmented Data


Creative Commons License

Ince V., Bader-El-Den M., Eşmeli R., Sari O. F.

Food and Bioprocess Technology, cilt.19, sa.4, 2026 (SCI-Expanded, Scopus) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 19 Sayı: 4
  • Basım Tarihi: 2026
  • Doi Numarası: 10.1007/s11947-026-04222-3
  • Dergi Adı: Food and Bioprocess Technology
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC
  • Anahtar Kelimeler: Explainable artificial intelligence, Food safety, GAN, Machine learning, Sausage spoilage, VOC analysis
  • Van Yüzüncü Yıl Üniversitesi Adresli: Evet

Özet

Abstract: Food spoilage prediction is a critical challenge in food safety and quality management, particularly for meat products exhibiting complex microbiological and biochemical dynamics. This study presents an explainable machine learning framework for predicting sausage spoilage intensity using volatile organic compound (VOC) profiles and physicochemical parameters, enhanced through Generative Adversarial Network (GAN)-based data augmentation. The proposed framework integrates interpretable machine learning models, random forest, gradient boosting, logistic regression, multi-layer perceptron, and a voting classifier with the TVAESynthesizer generative model to address data scarcity and imbalance in experimental food datasets. SHapley Additive exPlanations (SHAP) were employed to quantify the contribution of individual VOCs and physicochemical variables to spoilage classification, thereby enhancing model transparency and biological interpretability. Results revealed that GAN-augmented datasets substantially improved predictive performance compared to models trained on original data. For poultry sausages, the gradient boosting and random forest models achieved an accuracy of 0.92, while for pork sausages, both models reached an accuracy of 0.89. In addition, fold-wise regeneration of synthetic data during cross-validation yielded highly stable model performance, with Random Forest and Gradient Boosting achieving accuracies and F1-scores above 0.90 for poultry sausages, and consistently robust peak accuracies around 0.89 for pork sausages, confirming the reliability of the GAN-augmented training strategy. SHAP analysis revealed that Sampling Time and pH are the dominant predictors of spoilage for both poultry and pork sausages, with alcohol-related volatile compounds such as 1-propanol, 2-butanone, and 2-butanol driving predictions in poultry, and ethyl acetate, methanethiol, dimethyl sulfide, and hexanal playing a major role in pork spoilage classification. Overall, integrating generative modeling with explainable AI significantly improves both predictive accuracy and interpretability. The proposed framework offers a sustainable, data efficient, and interpretable solution for real time, non-destructive monitoring of meat freshness and quality. Graphic Abstract: (Figure presented.)