Smart Document Analysis Using AI-ML

Abstract

In this era of digitalization, everything is smart and digitalized. All the documents are presented, prepared and shared as soft copies. Classifying those soft copy documents is gaining an important insight in recent times. It is attracting digital word with its impact in different fields like spam filtering, email routing, language identification, genre classification, sentimental analysis, readability assessment. Classifying documents that are available online using smart techniques helps different business. The easiest and efficient way of doing it is through machine learning and it makes human work much easier. To perform classification of document more statistically, documents should be given in a much understandable format to the machine learning classifier. In this report, I’m discussing the types of feature depending on which an document can be classified and later represented. Record arrangement or classifying the documents is the purpose of document collection and classifications based upon the information it consists off and features that it contains. Record arrangement is a huge learning issue that is at the center of numerous data executives and recovery. Document grouping plays an important role in different applications that help with sorting out, ordering, looking and briefly speaking to a lot of data. In this report, we will be discussing the uses of document classification and important steps used for classifying the document or text by considering a small use case to know how document classification is done, basic steps of document classification, processing and analyzing the documents that are collected. We have considered two different categories of data sets for classification and analysis. The problem statement here is to distinguish those two documents where one is Rhyme document and each rhyme is taken as a single file and the other is normal sentences that are a Non-Rhyme document that contains normal Wikipedia text where few statements of Wikipedia is considered as a single file. The precise objective of my project is to develop scalable and efficient document classification project that classifies the document more precisely depending on the feature that it contains and to know the basic techniques that are used for the document a classification like, data collection, data cleaning, pre-processing and constructing an ML model and applying the ML algorithm. Another objective of the project is to work on machine learning concepts and to get insight into different classification algorithms with the help of this case study.

Authors and Affiliations

Sindhu Rashmi. H. R, Prof. Anisha. B. S, Dr. Ramakanth Kumar. P

Keywords

Related Articles

Detection of Similar Identities in XML Documents

Duplicate detection is an important part of data cleaning; it is the process of detecting multiple representations of a same real-world object in the data sources. Numbers of solutions are available for detecting duplica...

An Identified Kidney Cancer Using Decision Tree and Naïve Bayes Algorithm in Data Mining

Several clients with kidney cancer are able to receive curative treatment because there is nowadays no way to detect the cancer in its initial stages. To decrease the likelihood of kidney tumor cells and the need for tra...

A Study of WSN and Analysis of Packet Drop During Transmission

WSN is a low-power system and are often used in numerous monitoring uses, such as healthcare, environmental, and systemic health surveillance, in addition to military surveillance. It is important to reduce network resou...

Analysis of Bituminous Concrete Mixes Using H.D.P.E & Crumb Rubber as Admixtures

Flexible pavements need more attention in selection of Resources and preparation of mixes now a day’s temperature is the main criteria which affect the mix quality, strength and durability. Rapid changes in temperature n...

A Review of Renewable Technology Integration in Historical Buildings

In recent years, decommissioned historic structures have been repurposed for private or public use. Sector consumes over a third of global final energy and create a large amount of CO2. The need to comply with energy con...

Download PDF file
  • EP ID EP748115
  • DOI 10.21276/ijircst.2019.7.3.6
  • Views 68
  • Downloads 0

How To Cite

Sindhu Rashmi. H. R, Prof. Anisha. B. S, Dr. Ramakanth Kumar. P (2019). Smart Document Analysis Using AI-ML. International Journal of Innovative Research in Computer Science and Technology, 7(3), -. https://europub.co.uk/articles/-A-748115