IRS for Computer Character Sequences Filtration: a new software tool and algorithm to support the IRS at tokenization process
Journal Title: International Journal of Advanced Computer Science & Applications - Year 2013, Vol 4, Issue 2
Abstract
Tokenization is the task of chopping it up into pieces, called tokens, perhaps at the same time throwing away certain characters, such as punctuation. A token is an instance of token a sequence of characters in some particular document that are grouped together as a useful semantic unit for processing. New software tool and algorithm to support the IRS at tokenization process are presented. Our proposed tool will filter out the three computer character Sequences: IP-Addresses, Web URLs, Date, and Email Addresses. Our tool will use the pattern matching algorithms and filtration methods. After this process, the IRS can start a new tokenization process on the new retrieved text which will be free of these sequences.
Authors and Affiliations
Ahmad Badawi, Qasem Al-Haija
Dynamic Approach To Enhance Performance Of Orthogonal Frequency Division Multiplexing(OFDM) In A Wireless Communication Network
In the mobile radio environment, signals are usually impaired by fading and multipath delay phenomenon. This work modeled and simulates OFDM in a wireless environment, it also illustrates adaptive modulation and coding...
Multi-Depots Vehicle Routing Problem with Simultaneous Delivery and Pickup and Inventory Restrictions: Formulation and Resolution
Reverse logistics can be defined as a set of practices and processes for managing returns from the consumer to the manufacturer, simultaneously with direct flow management. In this context, we have chosen to study an imp...
Mapping Wheat Crop Phenology and the Yield using Machine Learning (ML)
Wheat has been a prime source of food for the mankind for centuries. The final wheat grain yield is the multitude of the complex interaction among the various yield attributes such as kernel per plant, Spike per plant, N...
A study on Security within public transit vehicles
In public transit vehicles, security is the major concern for the passengers. Surveillance Systems provide the security by providing surveillance cameras in the vehicles and a storage that maintains the data. The applica...
Generation of Attributes for Bangla Words for Universal Networking Language(UNL)
The usage of native language through Internet is highly demanding now a day due to rapidly increase of Internet based application in daily needs. It is important to read all information in Bangla from the internet. Unive...