Aggregating textual and video data from movies

Journal Title: Romanian Journal of Human - Computer Interaction - Year 2016, Vol 9, Issue 3

Abstract

In this paper, we present an automatically annotated corpus based on movie screenplays (script) and subtitles. We extract the relevant textual information from movie screenplays and subtitles using a regular expression approach. Then, we synchronize screenplays with subtitles using a matching algorithm, thus bounding each sentence from a script between two temporal limits. We also developed an application using the corpus to test our approach and to show practical situations where this corpus is useful. The application employs topic detection and it involves searching for a specified topic in the movie text and marking the topic as non-existent, episodic or primary topic for the analyzed text. The major problem we faced while working on this system was the unexpected structure of the screenplay sheets as this kind of files are not entirely written using a standardized format which can be easily parsed and structured automatically. Some types of errors can be overcome with regular expressions, but there are other errors that need a machine learning approach to be surpassed.

Authors and Affiliations

Alexandru Hulea, Traian Rebedea

Keywords

Related Articles

An empirical study of the determinants factors and gender differences in the acceptance of e-learning technology

The UTAUT model (Venkatesh et. al., 2003) was tested in different domains and contexts of use. In the interpretation and comparison of results from different studies is important to assess whether the model and the const...

A Framework for Assessing Online Presence in Social Media

This paper intends to provide a set of criteria for analyzing various social media platforms against their potential for supporting users in building their online presence. The four basic elements included in this frame...

Polycafe: Advanced System for the Evaluation of Chat Conversations Based on the Polyphonic Model

Chat conversations have become very popular among the members of various online communities. With the popularity growth of computer supported collaborative learning (CSCL), chat conversations have started being used in e...

Analysis of three instruments for measuring usability, satisfaction, and user experience in Romanian context

This paper focuses on the relation between the main concepts used in Human-Computer Interaction domain in order to study users’ perception of interactive products quality like usability, satisfaction, and user experience...

Cognitive task analysis: theoretical and methodological aspects

This paper discusses some of the theoretical and methodological challenges faced by nowadays users of cognitive tasks analysis (CTA) techniques. Despite the fact that CTA is recognised as an important prerequisite for th...

Download PDF file
  • EP ID EP28990
  • DOI -
  • Views 354
  • Downloads 10

How To Cite

Alexandru Hulea, Traian Rebedea (2016). Aggregating textual and video data from movies. Romanian Journal of Human - Computer Interaction, 9(3), -. https://europub.co.uk/articles/-A-28990