Aggregating textual and video data from movies
Journal Title: Revista Romana de Interactiune Om-Calculator - Year 2016, Vol 9, Issue 3
Abstract
In this paper, we present an automatically annotated corpus2 based on movie screenplays (script) and subtitles. We extract the relevant textual information from movie screenplays and subtitles using a regular expression approach. Then, we synchronize screenplays with subtitles using a matching algorithm, thus bounding each sentence from a script between two temporal limits. We also developed an application using the corpus to test our approach and to show practical situations where this corpus is useful. The application employs topic detection and it involves searching for a specified topic in the movie text and marking the topic as a non-existent, episodic or primary topic for the analyzed text. The major problem we faced while working on this system was the unexpected structure of the screenplay sheets as this kind of files is not entirely written using a standardized format which can be easily parsed and structured automatically. Some types of errors can be overcome with regular expressions, but there are other errors that need a machine learning approach to be surpassed.
Authors and Affiliations
Alexandru Hulea, Traian Rebedea
Recovering implicit thread structure in chat conversations
The analysis of chat conversations is a cumbersome task because of the number of different discussion threads that may occur at a certain moment. While most participants in a chat session tend to discuss one topic at a t...
Video Game Development Concepts
he paper represents a case study which provides an overview of the main concepts regarding the development of video games, and of the methods used by independent game developers to put these concepts into practice. The n...
Musical Information Retrieval System. Theory and Applications
In this paper are analyzed some specific Music Information Retrieval problems. Also it presents introductory notions and some practical applications in this domain. Many of the notions and some techniques for solvin...
Utilizarea interfeţelor LabView pentru generarea formelor de undă şi a fişierelor specifice încercării aparatelor electrice de comutaţie
Lucrarea prezintă câteva aspecte ale activităţii desfaşurate in laborator pentru incercări de natură electrică efectuate aparatelor de comutaţie şi echipamentelor electrotehnice. Sunt evidenţiate elementele principale al...
Personas Method in the Context of Semantic Web
Personas is a one of the most used method for gathering and presenting the user preferences in the context of the Human-Computer Interaction. However, a few studies have focused on linking information obtained via the Pe...