A Semantic Approach to Person Profile Extraction from Farsi Web Documents

Journal Title: Journal of Information Systems and Telecommunication - Year 2016, Vol 4, Issue 4

Abstract

Entity profiling (EP) as an important task of Web mining and information extraction (IE) is the process of extracting entities in question and their related information from given text resources. From computational viewpoint, the Farsi language is one of the less-studied and less-resourced languages, and suffers from the lack of high quality language processing tools. This problem emphasizes the necessity of developing Farsi text processing systems. As an element of EP research, we present a semantic approach to extract profile of person entities from Farsi Web documents. Our approach includes three major components: (i) pre-processing, (ii) semantic analysis and (iii) attribute extraction. First, our system takes as input the raw text, and annotates the text using existing pre-processing tools. In semantic analysis stage, we analyze the pre-processed text syntactically and semantically and enrich the local processed information with semantic information obtained from a distant knowledge base. We then use a semantic rule-based approach to extract the related information of the persons in question. We show the effectiveness of our approach by testing it on a small Farsi corpus. The experimental results are encouraging and show that the proposed method outperforms baseline methods.

Authors and Affiliations

Hojjat Emami, Hossein Shirazi, Ahmad Abdollahzadeh Barforoush

Keywords

Related Articles

Mitosis detection in breast cancer histological images based on texture features using AdaBoost

Counting mitotic figures present in tissue samples from a patient with cancer, plays a crucial role in assessing the patient’s survival chances. In clinical practice, mitotic cells are counted manually by pathologists in...

Load Balanced Spanning Tree in Metro Ethernet Networks

Spanning Tree Protocol (STP) is a link management standard that provides loop free paths in Ethernet networks. Deploying STP in metro area networks is inadequate because it does not meet the requirements of these network...

Theory and Experiment of Parasitic Element Effects on Spherical Probe-Fed Antenna

Theory and experiment of a spherical probe-fed conformal antenna with a parasitic element mounted on a spherical multilayer structure are presented in this paper. Rigorous mathematical Method of Moments (MoMs) for analyz...

Better Performance of New Generation of Digital Video Broadcasting-terrestrial (DVB-T2) Using Alamouti scheme with Cyclic Delay Diversity

The goal of the future terrestrial digital video broadcasting (DVB-T) standard is to employ diversity and spatial multiplexing in order to achieve the fully multiple-input multiple-output (MIMO) channel capacity. The DVB...

A Persian Fuzzy Plagiarism Detection Approach

Plagiarism is one of the common problems that is present in all organizations that deal with electronic content. At present, plagiarism detection tools, only detect word by word or exact copy phrases and paraphrasing is...

Download PDF file
  • EP ID EP183955
  • DOI 10.7508/jist.2016.04.004
  • Views 165
  • Downloads 0

How To Cite

Hojjat Emami, Hossein Shirazi, Ahmad Abdollahzadeh Barforoush (2016). A Semantic Approach to Person Profile Extraction from Farsi Web Documents. Journal of Information Systems and Telecommunication, 4(4), 232-243. https://europub.co.uk/articles/-A-183955