Key Frame Extraction for Text Based Video Retrieval Using Maximally Stable Extremal Regions

Journal Title: EAI Endorsed Transactions on e-Learning - Year 2015, Vol 2, Issue 7

Abstract

This paper presents a new approach for text-based video content retrieval system. The proposed scheme consists of three main processes that are key frame extraction, text localization and keyword matching. For the key-frame extraction, we proposed a Maximally Stable Extremal Region (MSER) based feature which is oriented to segment shots of the video with different text contents. In text localization process, in order to form the text lines, the MSERs in each key frame are clustered based on their similarity in position, size, color, and stroke width. Then, Tesseract OCR engine is used for recognizing the text regions. In this work, to improve the recognition results, we input four images obtained from different pre-processing methods to Tesseract engine. Finally, the target keyword for querying is matched with OCR results based on an approximate string search scheme. The experiment shows that, by using the MSER feature, the videos can be segmented by using efficient number of shots and provide the better precision and recall in comparison with a sum of absolute difference and edge based method.

Authors and Affiliations

Werachard Wattanarachothai, Karn Patanukhom

Keywords

Related Articles

Fostering collective intelligence education

New educational models are necessary to update learning environments to the digitally shared communication and information. Collective intelligence is an emerging field that already has a significant impact in many areas...

How Does It Feel Like? An Exploratory Study of a Prototype System to Convey Emotion through Haptic Wearable Devices

This paper reports on the design and implementation of a portable, hands-free, wearable haptic device that maps the emotions evoked by the music in a movie into vibrations, with the aim that hearing-impaired audience can...

A latent profile analysis of students’ motivation of engaging in one-to-one computing environment for English learning

This study used latent profile analysis to cluster students into three groups with homogenous motivational profiles based on self-reported self-efficacy, task value and task anxiety measures obtained from 263 middle scho...

On the importance of social network sites in the transitions which characterize ‘emerging adulthood’

Modern-day economic and socio-cultural developments require people to be ever more specialized and mobile in their educational and professional choices. This is particularly relevant for ‘emerging adults’, that is to say...

A Study of the Effects of First Person versus Third Person View in Educational Animation

The paper reports a study that investigated the effect of egocentric versus exocentric view in an educational animation whose goal was to teach undergraduate students the various tasks that a construction manager perform...

Download PDF file
  • EP ID EP45951
  • DOI http://dx.doi.org/10.4108/icst.iniscom.2015.258410
  • Views 249
  • Downloads 0

How To Cite

Werachard Wattanarachothai, Karn Patanukhom (2015). Key Frame Extraction for Text Based Video Retrieval Using Maximally Stable Extremal Regions. EAI Endorsed Transactions on e-Learning, 2(7), -. https://europub.co.uk/articles/-A-45951