Aggregating textual and video data from movies

Journal Title: Revista Romana de Interactiune Om-Calculator - Year 2016, Vol 9, Issue 3

Abstract

In this paper, we present an automatically annotated corpus2 based on movie screenplays (script) and subtitles. We extract the relevant textual information from movie screenplays and subtitles using a regular expression approach. Then, we synchronize screenplays with subtitles using a matching algorithm, thus bounding each sentence from a script between two temporal limits. We also developed an application using the corpus to test our approach and to show practical situations where this corpus is useful. The application employs topic detection and it involves searching for a specified topic in the movie text and marking the topic as a non-existent, episodic or primary topic for the analyzed text. The major problem we faced while working on this system was the unexpected structure of the screenplay sheets as this kind of files is not entirely written using a standardized format which can be easily parsed and structured automatically. Some types of errors can be overcome with regular expressions, but there are other errors that need a machine learning approach to be surpassed.

Authors and Affiliations

Alexandru Hulea, Traian Rebedea

Keywords

Related Articles

Recovering implicit thread structure in chat conversations

The analysis of chat conversations is a cumbersome task because of the number of different discussion threads that may occur at a certain moment. While most participants in a chat session tend to discuss one topic at a t...

Video Game Development Concepts

he paper represents a case study which provides an overview of the main concepts regarding the development of video games, and of the methods used by independent game developers to put these concepts into practice. The n...

Musical Information Retrieval System. Theory and Applications

In this paper are analyzed some specific Music Information Retrieval problems. Also it presents introductory notions and some practical applications in this domain. Many of the notions and some techniques for solvin...

Utilizarea interfeţelor LabView pentru generarea formelor de undă şi a fişierelor specifice încercării aparatelor electrice de comutaţie

Lucrarea prezintă câteva aspecte ale activităţii desfaşurate in laborator pentru incercări de natură electrică efectuate aparatelor de comutaţie şi echipamentelor electrotehnice. Sunt evidenţiate elementele principale al...

Personas Method in the Context of Semantic Web

Personas is a one of the most used method for gathering and presenting the user preferences in the context of the Human-Computer Interaction. However, a few studies have focused on linking information obtained via the Pe...

Download PDF file
  • EP ID EP241846
  • DOI -
  • Views 141
  • Downloads 0

How To Cite

Alexandru Hulea, Traian Rebedea (2016). Aggregating textual and video data from movies. Revista Romana de Interactiune Om-Calculator, 9(3), 233-254. https://europub.co.uk/articles/-A-241846