Aggregating textual and video data from movies

Journal Title: Romanian Journal of Human - Computer Interaction - Year 2016, Vol 9, Issue 3

Abstract

In this paper, we present an automatically annotated corpus based on movie screenplays (script) and subtitles. We extract the relevant textual information from movie screenplays and subtitles using a regular expression approach. Then, we synchronize screenplays with subtitles using a matching algorithm, thus bounding each sentence from a script between two temporal limits. We also developed an application using the corpus to test our approach and to show practical situations where this corpus is useful. The application employs topic detection and it involves searching for a specified topic in the movie text and marking the topic as non-existent, episodic or primary topic for the analyzed text. The major problem we faced while working on this system was the unexpected structure of the screenplay sheets as this kind of files are not entirely written using a standardized format which can be easily parsed and structured automatically. Some types of errors can be overcome with regular expressions, but there are other errors that need a machine learning approach to be surpassed.

Authors and Affiliations

Alexandru Hulea, Traian Rebedea

Keywords

Related Articles

Access control in e-Commerce applications by using state machines

The paper refers to a particular domain of authorization and proposes the SCAR-ACE model for role based access control in e-Commerce applications. Nowadays, there are an increasing number of Web applications that require...

The interaction between the visually impaired user and the assistive technology

The interaction between the visually impaired user and the assistive technology offered the possibility to structure a new professional and educational approach, very much in conformity with the present social system's r...

Methodology for Identification and Evaluation of Web Application Performance Oriented Usability Issues

This paper aims to illustrate a methodology for identifying and assessing a set of performance issues encountered in a particular web application, with impact on the usability level. Throughout this methodology, several...

Language Resources for a Question-Answering System for Romanian

We describe here several language resources (a lexicon, a paradigmatic morphology, two linguistic thesauri – the Romanian wordnet and Eurovoc – and a parallel multilingual corpus) from the perspective of their utility es...

Social and Affective Presence in Online Communication

New communication technologies demand users to creatively adjust emotion expression strategies to the constrains of a new environment. The paper provides a brief analysis of the most frequent emotion representation modal...

Download PDF file
  • EP ID EP28990
  • DOI -
  • Views 388
  • Downloads 10

How To Cite

Alexandru Hulea, Traian Rebedea (2016). Aggregating textual and video data from movies. Romanian Journal of Human - Computer Interaction, 9(3), -. https://europub.co.uk/articles/-A-28990