Aggregating textual and video data from movies

Journal Title: Romanian Journal of Human - Computer Interaction - Year 2016, Vol 9, Issue 3

Abstract

In this paper, we present an automatically annotated corpus based on movie screenplays (script) and subtitles. We extract the relevant textual information from movie screenplays and subtitles using a regular expression approach. Then, we synchronize screenplays with subtitles using a matching algorithm, thus bounding each sentence from a script between two temporal limits. We also developed an application using the corpus to test our approach and to show practical situations where this corpus is useful. The application employs topic detection and it involves searching for a specified topic in the movie text and marking the topic as non-existent, episodic or primary topic for the analyzed text. The major problem we faced while working on this system was the unexpected structure of the screenplay sheets as this kind of files are not entirely written using a standardized format which can be easily parsed and structured automatically. Some types of errors can be overcome with regular expressions, but there are other errors that need a machine learning approach to be surpassed.

Authors and Affiliations

Alexandru Hulea, Traian Rebedea

Keywords

Related Articles

Aggregating textual and video data from movies

In this paper, we present an automatically annotated corpus based on movie screenplays (script) and subtitles. We extract the relevant textual information from movie screenplays and subtitles using a regular expression a...

Semantic Analysis of Source Code in Object Oriented Programming. A Case Study for C#

This paper describes the CSCRO ontology and the Sharp RDF system, used together to semantically analyze the C# source code. The CSCRO ontology formally describes the domain of C# programming language, in which the concep...

Software for Access of Persons with Disabilities to Scientific Content

For people with visual impairments, especially for the blind persons, the access to scientific content raises issues of accessibility both to reading web documents, which contain specific elements of mathematics as image...

The Analysis of Imaginary in Texts

The paper presents an approach and an implemented system for analyzing the imaginary in texts. This problem is very important because its resolution allows the identification of connotations in texts, with major implicat...

Facebook use by visually impaired students – an exploratory study

In recent years there is an unprecedented growth in popularity of Social Networking Sites (SNSs), Facebook being the best example. Blind people are distinct users of social networks. People with visual impairments are us...

Download PDF file
  • EP ID EP28990
  • DOI -
  • Views 355
  • Downloads 10

How To Cite

Alexandru Hulea, Traian Rebedea (2016). Aggregating textual and video data from movies. Romanian Journal of Human - Computer Interaction, 9(3), -. https://europub.co.uk/articles/-A-28990