Spoken Turkish Corpus in Its Present Form: A Technical and Statistical Analysis
Journal Title: Mersin Üniversitesi Dil ve Edebiyat Dergisi - Year 2017, Vol 14, Issue 2
Abstract
The primary goal of this article is to explain the technologies and workflows used to build the METU Spoken Turkish Corpus (STC), which is pioneered by the late Prof. Dr. Şükriye Ruhi. The Web Based Corpus Management System, which is crucial to the building of STC, contains a set of workflows, data formats and export options that make it easy to transcribe, control and publish corpus data. Corpus Management System was developed by the STC project members using the Python programming language and it enables the collaboration of remote project members with different roles through an online interface. Within the STC, 286,391 words long speech are transcribed and checked; in addition, 79,189 words long recordings are made ready to publish. The article presents general statistics about the recordings in the STC and discusses what needs to be done for the publication of a large scale version of the STC.
Authors and Affiliations
Güneş Acar
Multi-word Expressions in Genre Specification
Corpus analyses of lexical structures have uncovered different functions that they come to serve in textual organisation. Frequently occurring patterns of lexical items, the multi-word units, display different distributi...
The Teller/Receiver-Oriented Functions of Ondan Sonra As A Discourse Marker in Conversational Narratives
Discourse markers that are largely used in everyday talk carry out various functions in conversations. One of the conversational genres in which discourse markers are highly used is conversational narrative. Conversation...
The Analysis of Arkadaş Türkçe Sözlük (Arkadaş Turkish Dictionary) and the Suggested Modifications for its Learner's Version
This study aims to draw attention to two of the problems that Turkish lexicography faces today. One of these problems is that there are not any Turkish to Turkish dictionaries that have been prepared for the people who a...
The Verb Bak- ‘to look’ and its discourse functions: Corpus concordances
The verb of perception bakmak ‘to look’ in Turkish, as with similar other verbs in other languages, has undergone semantic bleaching. The result is the frequent use of the verb of perception with a function of a discours...
Overlap and Overlap Resolution in Debate
Talk in interaction is maintained with a systematic and orderly exchange of turns with minimum silence and overlap. An overlap refers to the simultaneous attempts of the interactants to take the turn using a variety of i...