You are using an outdated browser Internet Explorer. It does not support some functions of the site.

Recommend that you install one of the following browsers: Firefox, Opera or Chrome.


+7 961 270-60-01

Development of a service for generating word forms in corpus linguistics


Development of a service for generating word forms in corpus linguistics

Sibgatullin M.R., Minyazev R.Sh., Safiulin I.I., Biktasheva A.Sh., Pashin N.P.

Incoming article date: 26.04.2022

The subject of research is the development of a service for generating various forms of a given word based on the analysis of words found in the dictionary. The available approaches to solving such a problem were studied and the most relevant one was chosen. The service searches inside the dictionary file with text content in order to automate the process of selecting the necessary words among the entire set. The search for the stem of the word is performed, taking into account the morphology. Performing a morphological analysis of a word, a common basis for all its grammatical forms is found, cutting off suffixes and endings. As a result, the service algorithm allows you to search for all forms of a word by a given keyword, taking into account word forms. At the same time, it also analyzes which part of speech the word belongs to, this allows you to set different methods for determining word forms. For each type of word: verb, noun, adjective, adverb, its own algorithm is used to highlight word forms. The peculiarity of the service is that it allows you not only to search for word forms in the dictionary, but also allows you to generate sets of word forms based on the type of a given word. The service operates on the Linux platform under the control of the Apache web server. Free software tools were used for development. The development was carried out in JavaScript, HTML and CSS, the server-side programming language PHP7 was also used.

Keywords: search engine, document analysis, linguistics, word forms, morphology, word generation, web service