MACHINE LEARNING METHODS FOR ANALYZING TEXTS IN KAZAKH
DOI:
https://doi.org/10.54251/2616-6429.2025.01.08nuKeywords:
machine learning, text analysis, natural language processing, machine learning algorithms, classification methods, Kazakh language, machine translationAbstract
The article provides an overview of the application of machine learning methods for analyzing texts in the Kazakh language. The methods considered include automatic spelling correction, text tonality analysis, machine translation, and text classification. Special attention is paid to the adaptation of algorithms to the specific linguistic features of the Kazakh language. The prospects for the development of specialized feature extraction methods necessary to improve the accuracy and performance of models are discussed. One of the promising directions is the use of transformers for text analysis. These models, due to their attention mechanism, are able to identify key elements of the text, which is especially important for agglutinative languages such as Kazakh. The naive Bayes classifier is a probabilistic method based on Bayes' theorem. It assumes the independence of features and calculates the probability that the text belongs to a certain category. Its advantage is simplicity and high speed of operation, however, it may suffer from insufficient accuracy with complex dependencies between words. It is important to take into account the complex morphological structures of words. For example, neural networks based on the LSTM architecture can successfully identify hidden emotions even in complex sentences.