Classification and analysis for Focused Crawled Textual Dataset for retrieving Indian origin scientists
Journal Title: International Journal of Experimental Research and Review - Year 2023, Vol 34, Issue 5
Abstract
Text classification also called (text categorization or text tagging) is a crucial and extensively used approach in Natural Language Processing (NLP), to predict unseen content documents into prearranged categories. In this paper, we evaluate the dataset construction and evaluation process as a component of text classification. To begin with, we produced a newly created dataset for Indian Origin Scientists for text classification, which was collected by applying focused crawling and web scraping techniques. We then demonstrate an extensive evaluation of numerous models on this recently constructed dataset. Our evaluations display that the Random forest model outperforms the rest of the supervised models. Our results produce a fine beginning for additional research in Indian Origin Scientists' classification of text. Investigational outcome with K Nearest Neighbor, Logistic Regression, and Support Vector Machine for Indian-origin scientists produced much better performances for Random Forest when combined with SMOTE and K fold cross-validation techniques. We apply the Area under the ROC Curve to compute the effectiveness of the chosen models. Overall, the Random Forest classifier exhibited the best output along with 90% micro-average AUC.
Authors and Affiliations
Shivani Gautam, Rajesh Bhatia, Shaily Jain
TLBO-trained ANN-based Shunt Active Power Filter for Mitigation of Current Harmonics
The increased utilization of nonlinear devices is resulting in damage to power distribution infrastructure by introducing harmonics into power system networks, which in turn causes distortion in voltage and current signa...
Experimental Analysis of Surface Roughness Optimization of EN19 Alloy Steel Milling by the Cuckoo Search Algorithm
In the present paper, end milling has been performed on EN19 alloy steel by selecting cutting speed, feed rate, & depth of cut as input parameters and surface roughness (SR) as a response. EN19 alloy steel milling is wid...
Effectiveness of Nursing Strategies on Risk for Pneumonia Among Patients Connected to Mechanical Ventilator in Intensive Care Unit
It has been reported that the incidence of nosocomial infections in the ICU is about 2–5 times higher than in the general in-patient hospital population. The study's objective was to evaluate the effectiveness of Nursing...
Examining the Pandemic Induced Adoption of E-Learning Through a UTAUT Model Approach
The Covid-19 pandemic's worldwide disruption has significantly impacted many facets of society, including education and learning. This seismic effect results from the urgent need to stop the virus from spreading, which c...
Occurrences of seven new records of goat fishes (family: Mullidae) from the coastal waters ofWest Bengal, India
Thirty eight fish specimens of family Mullidae were collected during the ornamental faunal survey around the West Bengal coast. All these specimens were identified into seven species which are addition to the faunal reso...