Development of information technology of term extraction from documents in natural language
Journal Title: Восточно-Европейский журнал передовых технологий - Year 2018, Vol 6, Issue 2
Abstract
<p class="KeywordsCxSpFirst">It is shown that domain dictionaries are widely used at various stages of design and operation of software products. The process of dictionary development, especially term extraction, is very labor-intensive, requiring high qualification of the expert. Studies are conducted to identify the most important characteristics of multi-word terms (MWT), such as: the probability of the presence of terms containing different numbers of words in the document; arrangement of nouns in MWT; possible number of nouns in MWT. The context of the use of terms is analyzed and possible limits of terms in the text are identified. The procedure is proposed for preliminary document grouping, thus avoiding the “loss” of terms included in short documents. The dependence of errors of term extraction on the size of the analyzed document is determined.</p><p class="KeywordsCxSpLast">The mathematical model of term representation, based on the definition of the set of word chains grouped around a head-word – a noun is proposed. Filtration of chains is performed depending on the frequency of their occurrence in the text based on a comparison of normalized representations of MWT.</p>Mechanisms for filling the domain dictionary with new records and adjusting existing ones in the process of analyzing the input document are developed. The solution to adjust the frequency of occurrence of terms based on the identification of inter-phrase relations is proposed. All processes and models are combined into a single information technology of construction of the domain dictionary. The problem of term interpretation is not considered in this paper, since it requires a separate solution. The software product allowing to automate substantially the process of term extraction from text documents is developed. The results of testing of the proposed solutions showed the absence of “lost terms” and, as a result, the reduction of the time of term extraction from texts of 10,000 words by 1.5 hours by freeing the expert from analyzing the original document. The research results can be used at various stages of design and operation of software products
Authors and Affiliations
Oleksii Kungurtsev, Svetlana Zinovatnaya, Iana Potochniak, Maxim Kutasevych
Development of the simulation model of the interaction of automatic controllers in the control system of the energy complex
<p>The developed simulation model of the control system of the power plant operating in different modes is presented. Work on the development of the control system of the power plant, the physical control of which will b...
Movable blade vertical shaft kinetic turbine visual observation
<p>Kinetic energy is the energy produced due to the river water flow speed. This water speed energy can be effectively implemented as a rural power plant. This research has been carried out experimentally and the researc...
Geometric modeling of the unfolding of a rod structure in the form of a double spherical pendulum in weightlessness
<p>We investigated the geometric model of the new technique for unfolding a rod structure, similar to the double spherical pendulum, in weightlessness. Displacements of elements occur due to the pulses from pyrotechnic j...
Study of quality indicators for meat raw materials and the effectiveness of a protective technological method under conditions of different content of heavy metals in a pig diet
The paper report results of research related to studying a change in the qualitative indicators and technological properties of meat and lard of swine under conditions of different diet compositions, specifically, with t...
Research into emissions of nitrogen oxides when converting the diesel engines to alternative fuels
<p>We conducted theoretical and experimental research into emissions of nitrogen oxides in exhaust gases of the diesel engines, re-equipped for gas.</p><p>The research is important because the emissions of nitrogen oxide...