Classification and analysis for Focused Crawled Textual Dataset for retrieving Indian origin scientists
Journal Title: International Journal of Experimental Research and Review - Year 2023, Vol 34, Issue 5
Abstract
Text classification also called (text categorization or text tagging) is a crucial and extensively used approach in Natural Language Processing (NLP), to predict unseen content documents into prearranged categories. In this paper, we evaluate the dataset construction and evaluation process as a component of text classification. To begin with, we produced a newly created dataset for Indian Origin Scientists for text classification, which was collected by applying focused crawling and web scraping techniques. We then demonstrate an extensive evaluation of numerous models on this recently constructed dataset. Our evaluations display that the Random forest model outperforms the rest of the supervised models. Our results produce a fine beginning for additional research in Indian Origin Scientists' classification of text. Investigational outcome with K Nearest Neighbor, Logistic Regression, and Support Vector Machine for Indian-origin scientists produced much better performances for Random Forest when combined with SMOTE and K fold cross-validation techniques. We apply the Area under the ROC Curve to compute the effectiveness of the chosen models. Overall, the Random Forest classifier exhibited the best output along with 90% micro-average AUC.
Authors and Affiliations
Shivani Gautam, Rajesh Bhatia, Shaily Jain
Study of rhizospheric bacterial population of Azadirachta indica (Neem) of North 24 Parganas district of West Bengal for bioprospective consideration
The rhizospheric microbial population has immense role in agriculture and crop improvement. This article deals with the preliminary information about the rhizospheric bacterial population of Azadirachta indica growing at...
A Comprehensive Chemical Characterization of Leaves of Five Potential Medicinal Plants in Paschim Medinipur District, W. B., India
The physico-chemical and spectroscopic characterization of five selected medicinal plants viz., Acalypha indica, Senna tora, Euphorbia hirta, Physalis angulata and Ziziphus mauritina are the essence and has been carried...
Evaluation of Work Posture and Postural Stresses of Welders: A Report
Work related musculoskeletal disorders (WRMSD) are very common health problem in manufacturing sectors in all over India. Welding is one of the most important activities in manufacturing sector in our country. Higher ris...
Effect of Orthosiphon stamineus Extract on HIF-1Α, Endothelin-1, and VEGFR-2 Gene Expression in NRK-52E Renal Tubular Cells Subjected to Glucotixicity
This study aimed to investigate the impact of Orthosiphon stamineus extract on gene expression in NRK-52E cells under conditions of glucotoxicity. Gene expression analysis using RT-PCR was conducted following exposure of...
Stigma receptivity in Cashew nut (Anacardium occidentale L.)
The cashew is widely and commercially cultivated throughout the nation for its nut. Cashew is a polygamo - monoecious plant with both male and bisexual flowers developing in same inflorescence. Experimental study was con...