Spoken Turkish Corpus in Its Present Form: A Technical and Statistical Analysis
Journal Title: Mersin Üniversitesi Dil ve Edebiyat Dergisi - Year 2017, Vol 14, Issue 2
Abstract
The primary goal of this article is to explain the technologies and workflows used to build the METU Spoken Turkish Corpus (STC), which is pioneered by the late Prof. Dr. Şükriye Ruhi. The Web Based Corpus Management System, which is crucial to the building of STC, contains a set of workflows, data formats and export options that make it easy to transcribe, control and publish corpus data. Corpus Management System was developed by the STC project members using the Python programming language and it enables the collaboration of remote project members with different roles through an online interface. Within the STC, 286,391 words long speech are transcribed and checked; in addition, 79,189 words long recordings are made ready to publish. The article presents general statistics about the recordings in the STC and discusses what needs to be done for the publication of a large scale version of the STC.
Authors and Affiliations
Güneş Acar
Corpus Linguistics Studies: Intergenerational Solidarity Scale Development
Human interaction could be a focus of linguistics or sociology. When it is considered from a social perspective and the data is collected from language, the concepts reflected in language are examined. In such cases, sem...
Acoustic Correlates of Perceived Sexual Orientation
This study aimed to examine whether sexual orientation can be detected from monologue readings and narration. The main research question was, if naïve listeners could perceive the speakers’ sexual orientation accurately,...
Stance and Perception of Phonetic Variable
Perception of phonetic variables alongside social meanings has been the preliminary research question in the field of sociolinguistics in the last twenty years. The theoretical debate fostered in answering this research...
Laughter in Turkish: A Preliminary Study on Corpus Occurrences and Patterns
Laughter is one of the important components of human interaction and usually expressed acoustically and visually (Hempelmann, 2017; Trouvain & Truong, 2017). People laugh with various emotions, such as joy, affection, am...
Multi-word Expressions in Genre Specification
Corpus analyses of lexical structures have uncovered different functions that they come to serve in textual organisation. Frequently occurring patterns of lexical items, the multi-word units, display different distributi...