CLUSTERING OF SCIENTISTS' PUBLICATIONS, CONSIDERING FINDING SIMILARITIES IN ABSTRACTS AND TEXTS OF PUBLICATIONS BASED ON N-GRAM ANALYSIS AND IDENTIFYING POTENTIAL PROJECT GROUPS

Journal Title: Scientific Journal of Astana IT University - Year 2023, Vol 16, Issue 16

Abstract

The article describes the solution to the problem of clustering scientists' publications, taking into account the finding of similarities in the annotations and texts of these publications based on n-grams of analysis and cross-references, as well as the tasks of identifying potential project groups for the implementation of research and educational projects based on the results of clustering. The selection of scientific partners in the world practice is done without a comprehensive assessment of their activities. Most of the well-known indexes for evaluating the research activities of scientists need to consider information about citations fully. The methods developed in the study for evaluating the scientific activities of scientists and universities, as well as methods for selecting scientific partners for the implementation of educational and scientific projects on a scientific basis, allow us to organize the influential work of universities qualitatively. In the article, a probabilistic thematic model is constructed that allows the clustering of scientists' publications in scientific fields, considering the citation network, which is an important step in solving the problem of identifying subject scientific spaces. As a result of constructing the model, the problem of increasing instability of clustering of the citation graph due to a decrease in the number of clusters has been solved. The main objective of this work is to address the challenge of selecting suitable partners for collaboration in scientific and educational projects. To achieve this, a method for choosing project executors has been developed, which employs fuzzy logical inference to harmonize expert opinions regarding candidate requirements. This approach helps facilitate the multi-criteria selection of potential partners for scientific and educational projects. In addition to the method, various software modules have been created as part of this research. These modules are designed for the automated collection of information on the publications and citation records of scientists through international scientometric databases. They also encompass a visualization module and a user interface that aids in evaluating the scientific activities of university teaching staff. Choosing partners for grants or strategic collaborations, especially in the context of a globalized and highly mobile scientific community, remains a pertinent issue. The approach described in this research involves clustering the scientific publications of potential project partners. Furthermore, it incorporates conducting comparative citation analyses of these publications and establishing proximity based on n-gram annotation analysis. These methods provide a scientific basis for making informed choices when selecting partners, which is crucial for initiating and advancing research projects. Consequently, the selection of partners for forming research project teams is an immediate and pressing task.

Authors and Affiliations

Andrii Biloshchytskyi, Olexandr Kuchansky, Aidos Mukhatayev, Svitlana Biloshchytska, Yurii Andrashko, Sapar Toxanov, Adil Faizullin

Keywords

Related Articles

DEVELOPMENT OF AN INFORMATION AND EDUCATIONAL PORTAL OF DISTANCE LEARNING BASED ON EDUCATIONAL DATA MINING

Currently, there is an increase in demand for distance education programs, which actualizes the problems of organizing the educational process at universities using these technologies. The article highlights and descri...

DATA SECURITY, MODELING AND VISUALIZATION OF DATA FROM IOT DEVICES

The article describes the IoT infrastructure, the hardware of the IoT system, considers the issue of security of the chosen LoRa data transmission technology. Data was received from sensors for gas, temperature and hum...

DEEP LEARNING-BASED FACE MASK DETECTION USING YOLOV5 MODEL

Based on the background of rapid transmission of novel coronavirus and various pneumonia, wearing masks becomes the best solution to effectively reduce the probability of transmission. For a series of problems arising fr...

DYNAMICS AND IMPACT OF DIGITAL FOOTPRINT ON PROJECT SUCCESS

The digital footprint of the project is its integral characteristic, reflecting both the “official” information on the project, unnecessary and any mention of the project including social networks and other Internet re...

METHODS OF INFORMATION SECURITY IN WIRELESS NETWORKS

The development of information technology sets the task of improving the reliability of computer networks. To study the security of networks, it is necessary to study the creation of network protocols, network architectu...

Download PDF file
  • EP ID EP729512
  • DOI https://doi.org/10.37943/16AADE3851
  • Views 29
  • Downloads 0

How To Cite

Andrii Biloshchytskyi, Olexandr Kuchansky, Aidos Mukhatayev, Svitlana Biloshchytska, Yurii Andrashko, Sapar Toxanov, Adil Faizullin (2023). CLUSTERING OF SCIENTISTS' PUBLICATIONS, CONSIDERING FINDING SIMILARITIES IN ABSTRACTS AND TEXTS OF PUBLICATIONS BASED ON N-GRAM ANALYSIS AND IDENTIFYING POTENTIAL PROJECT GROUPS. Scientific Journal of Astana IT University, 16(16), -. https://europub.co.uk/articles/-A-729512