Spoken Turkish Corpus in Its Present Form: A Technical and Statistical Analysis

Journal Title: Mersin Üniversitesi Dil ve Edebiyat Dergisi - Year 2017, Vol 14, Issue 2

Abstract

The primary goal of this article is to explain the technologies and workflows used to build the METU Spoken Turkish Corpus (STC), which is pioneered by the late Prof. Dr. Şükriye Ruhi. The Web Based Corpus Management System, which is crucial to the building of STC, contains a set of workflows, data formats and export options that make it easy to transcribe, control and publish corpus data. Corpus Management System was developed by the STC project members using the Python programming language and it enables the collaboration of remote project members with different roles through an online interface. Within the STC, 286,391 words long speech are transcribed and checked; in addition, 79,189 words long recordings are made ready to publish. The article presents general statistics about the recordings in the STC and discusses what needs to be done for the publication of a large scale version of the STC.

Authors and Affiliations

Güneş Acar

Keywords

Related Articles

Corpus Linguistics Studies: Intergenerational Solidarity Scale Development

Human interaction could be a focus of linguistics or sociology. When it is considered from a social perspective and the data is collected from language, the concepts reflected in language are examined. In such cases, sem...

Acoustic Correlates of Perceived Sexual Orientation

This study aimed to examine whether sexual orientation can be detected from monologue readings and narration. The main research question was, if naïve listeners could perceive the speakers’ sexual orientation accurately,...

Stance and Perception of Phonetic Variable

Perception of phonetic variables alongside social meanings has been the preliminary research question in the field of sociolinguistics in the last twenty years. The theoretical debate fostered in answering this research...

Laughter in Turkish: A Preliminary Study on Corpus Occurrences and Patterns

Laughter is one of the important components of human interaction and usually expressed acoustically and visually (Hempelmann, 2017; Trouvain & Truong, 2017). People laugh with various emotions, such as joy, affection, am...

Multi-word Expressions in Genre Specification

Corpus analyses of lexical structures have uncovered different functions that they come to serve in textual organisation. Frequently occurring patterns of lexical items, the multi-word units, display different distributi...

Download PDF file
  • EP ID EP273735
  • DOI -
  • Views 83
  • Downloads 0

How To Cite

Güneş Acar (2017). Spoken Turkish Corpus in Its Present Form: A Technical and Statistical Analysis. Mersin Üniversitesi Dil ve Edebiyat Dergisi, 14(2), 1-14. https://europub.co.uk/articles/-A-273735