Automating the Shaping of Metadata Extracted from a Company Website with Open Source Tools
Journal Title: International Journal of Advanced Computer Science & Applications - Year 2014, Vol 4, Issue 1
Abstract
As part of a market analysis process, the objective was to automate the task of identifying the activities and skills of a collection of enterprises, namely Belgian and French open source companies. In order to avoid manual annotation through visual analysis of the websites’ content, a tool chain was developed to collect the content of websites and extract the important terms. Standard software libraries were identified, allowing to clean up HTML documents and to perform the part-of-speech tagging process used for extracting terminology. This procedure is supplemented by the extraction and the recognition of named entities. The terms extracted in the HTML pages of a company website were then merged and filtered and a circular tags cloud was generated. This presentation facilitates the identification of important terms, commonly referred to as activities and technologies supported by the company. Several changes are planned for this prototype, including, in particular, the extension to the texts in French, the association of extracted terms to the vocabulary of a classification scheme and the automatic generation of dashboards to facilitate the monitoring of the evolution of the industrial sector.
Authors and Affiliations
Dr Ir VISEUR
A Novel Approach for On-road Vehicle Detection and Tracking
On the basis of a necessary development of the road safety, vision-based vehicle detection techniques have gained an important amount of attention. This work presents a novel vehicle detection and tracking approach, and...
Sectorization of Full Kekre’s Wavelet Transform for Feature extraction of Color Images
An innovative idea of sectorization of Full Kekre’s Wavelet transformed (KWT)[1] images for extracting the features has been proposed. The paper discusses two planes i.e. Forward plane (Even plane) and backward plane (Od...
An Analysis of Encryption and Decryption Application by using One Time Pad Algorithm
Security of data in a computer is needed to protect critical data and information from other parties. One way to protect data is to apply the science of cryptography to perform data encryption. There are wide variety of...
Generalized Two Axes Modeling, Order Reduction and Numerical Analysis of Squirrel Cage Induction Machine for Stability Studies
A substantial amount of power system load is made of large number of three phase induction machine. The transient phenomena of these machines play an important role in the behavior of the overall system. Thus, mode...
Routing Discovery Algorithm Using Parallel Chase Packet
On demand routing protocols for ad hoc networks such as Ad Hoc On Demand Distance Vector (AODV) initiate a route discovery process when a route is needed by flooding the network with a route request packet. The route dis...