COLLECTION AND PREPARATION OF CRIMINAL CONTENT DATA FROM WEB SOURCES
DOI:
https://doi.org/10.54251/2616-6429.2025.01.11nuKeywords:
web content, Scikit-Lear, NLTK, TensorFlow, Python, Jupyter Notebook, BeautifulSoup(BS4), XML, HTML, machine learningAbstract
Criminal texts, such as planning crimes, inciting unlawful acts, and sharing false information, pose a threat to security in the online environment. Detecting and classifying such criminal texts is becoming an integral part of combating cybercrime. With the increasing volume of publicly available information and the rise in illegal activities on the Internet, it is necessary to develop effective methods and approaches for the automatic detection and classification of criminal texts.
One of the approaches used in the classification of criminal texts is the application of morphological analysis methods. Morphological analysis allows for the examination of word structures, their grammatical forms, and lexical and syntactic features. However, criminal texts have distinct characteristics, which means that existing morphological analysis methods are not always effective for their classification. Therefore, the task of modifying and improving existing methods arises in order to enhance accuracy and achieve more reliable results.