A Semantic Approach to Person Profile Extraction from Farsi Web Documents

Journal Title: Journal of Information Systems and Telecommunication - Year 2016, Vol 4, Issue 4

Abstract

Entity profiling (EP) as an important task of Web mining and information extraction (IE) is the process of extracting entities in question and their related information from given text resources. From computational viewpoint, the Farsi language is one of the less-studied and less-resourced languages, and suffers from the lack of high quality language processing tools. This problem emphasizes the necessity of developing Farsi text processing systems. As an element of EP research, we present a semantic approach to extract profile of person entities from Farsi Web documents. Our approach includes three major components: (i) pre-processing, (ii) semantic analysis and (iii) attribute extraction. First, our system takes as input the raw text, and annotates the text using existing pre-processing tools. In semantic analysis stage, we analyze the pre-processed text syntactically and semantically and enrich the local processed information with semantic information obtained from a distant knowledge base. We then use a semantic rule-based approach to extract the related information of the persons in question. We show the effectiveness of our approach by testing it on a small Farsi corpus. The experimental results are encouraging and show that the proposed method outperforms baseline methods.

Authors and Affiliations

Hojjat Emami, Hossein Shirazi, Ahmad Abdollahzadeh Barforoush

Keywords

Related Articles

Low Complexity Median Filter Hardware for Image Impulsive Noise Reduction

Median filters are commonly used for removal of the impulse noise from images. De-noising is a preliminary step in online processing of images, thus hardware implementation of median filters is of great interest. Hence,...

An Improved Method for TOA Estimation in TH-UWB System considering Multipath Effects and Interference

UWB ranging is usually based on the time-of-arrival (TOA) estimation of the first path. There are two major challenges in TOA estimation. One challenge is to deal with multipath channel, especially in indoor environments...

Node to Node Watermarking in Wireless Sensor Networks for Authentication of Self Nodes

In order to solve some security issues in Wireless Sensor Networks (WSNs), node to node authentication method based on digital watermarking technique for verification of relative nodes is proposed. In the proposed method...

Privacy Preserving Big Data Mining: Association Rule Hiding

Data repositories contain sensitive information which must be protected from unauthorized access. Existing data mining techniques can be considered as a privacy threat to sensitive data. Association rule mining is one of...

Data Aggregation Tree Structure in Wireless Sensor Networks Using Cuckoo Optimization Algorithm

Wireless sensor networks (WSNs) consist of numerous tiny sensors which can be regarded as a robust tool for collecting and aggregating data in different data environments. The energy of these small sensors is supplied by...

Download PDF file
  • EP ID EP183955
  • DOI 10.7508/jist.2016.04.004
  • Views 158
  • Downloads 0

How To Cite

Hojjat Emami, Hossein Shirazi, Ahmad Abdollahzadeh Barforoush (2016). A Semantic Approach to Person Profile Extraction from Farsi Web Documents. Journal of Information Systems and Telecommunication, 4(4), 232-243. https://europub.co.uk/articles/-A-183955