A new distributed anomaly detection approach for log IDS management based on deep learning


Koca M., AYDIN M. A., Sertbas A., Zaim A. H.

TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, cilt.29, sa.5, ss.2486-2501, 2021 (SCI-Expanded) identifier identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 29 Sayı: 5
  • Basım Tarihi: 2021
  • Doi Numarası: 10.3906/elk-2102-89
  • Dergi Adı: TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Applied Science & Technology Source, Compendex, Computer & Applied Sciences, INSPEC, TR DİZİN (ULAKBİM)
  • Sayfa Sayıları: ss.2486-2501
  • Anahtar Kelimeler: Big Data, Deep Learning, Cyber Security, Log IDS, Spark, CUDA, NETWORKS
  • Van Yüzüncü Yıl Üniversitesi Adresli: Evet

Özet

Today, with the rapid increase of data, the security of big data has become more important than ever for managers. However, traditional infrastructure systems cannot cope with increasingly big data that is created like an avalanche. In addition, as the existing database systems increase licensing costs per transaction, organizations using information technologies are shifting to free and open source solutions. For this reason, we propose an anomaly attack detection model on Apache Hadoop distributed file system (HDFS), which stands out in open source big data analytics, and Apache Spark, which stands out with its speed performance in analysis to reduce the costs of organizations. This model consists of four stages: the first of which is to store instant data on HDFS in a distributed manner. In the second stage, the log data generated in the network traffic are analyzed by taking the data on Apache Spark and including the log data created by HDFS. In the third stage, the data preprocessing stage and with the CUDA parallel programming in the TensorFlow library, we apply our deep learning (cuDNN) method to the distributed anomaly detection with the computational support of GPUs. In the last stage, the generated alarms are recorded on HDFS again. We conducted comparative experiments with the approach we propose to detect cyberattack anomalies in log data management with the classification methods used in machine learning. The results obtained in these experiments appear to provide a promising gain in performance evaluation metrics compared to the other available methods.