Модель классификации трафика с целью обнаружения роботизированной активности

Шукралиева Д.Э.; Филатова А.С.

A traffic classification model for detecting robotic activity

Shukralieva D.E., Filatova A.S.

Incoming article date: 24.05.2025

This article examines the growing threat of web scraping (parsing) as a form of automated cyberattack, particularly aimed. Although scraping publicly available data is often legal, its misuse can lead to serious consequences, including server overload, data breaches and intellectual property infringement. Recent court cases against OpenAI and ChatGPT highlight the legal uncertainty associated with unauthorized data collection. The study presents a dual approach to combat malicious scraping. Traffic Classification Model - a machine learning based solution using Random Forest algorithms results in performance that achieves 89% accuracy in distinguishing between legitimate and malicious bot traffic, enabling early detection of scraping attempts. Data Deception Technique - the countermeasure dynamically modifies HTML content to convey false information to scrapers while maintaining the original look of the page. This technique prevents data collection without affecting the user experience. Performance results include real-time traffic monitoring, dynamic page obfuscation, and automatic response systems. The proposed system demonstrates effectiveness in mitigating the risks associated with scraping and emphasizes the need for adaptive cybersecurity measures in evolving digital technologies.

Keywords: parsing, automated attacks, data protection, bot detection, traffic classification, machine learning, attack analysis, data spoofing, web security