Creating the own corpus of American film scripts

Abstract

The article deals with the problem of creating the own corpus of texts on the example of the corpus of American family film scripts. The methodology and criteria for constructing linguistic corpus are considered. The typology and main characteristics of the created corpus are determined. Special attention is drawn to the technological process of creating a corpus which included: finding the sources of linguistic material; entry of the data in the form of the texts of the film scripts presented in plain text format (*.txt), annotation, part-of-speech (POS) tagging, converting tagged texts into a specialized linguistic information retrieval system or corpus manager which provides rapid multi-dimensional search and statistical processing. In this research we used the AntConc manager. We focused on analysis of the created corpus which included: defining the total number of tokens and total number of types in the corpus, finding the type-token ratio (TTR) and standard type-token ratio (STTR), making a list of the most frequent word forms, clarification of the hapax legomena (words used in corpus only once), detection of frequency of distribution of different parts of speech, finding the index of the lexical density, defining the average length of the sentence of the corpus, determination of the index of formality, making a keyword list. We found that most key words are lexically neutral, belong to the core vocabulary and relate to everyday family life. Two words from the list belong to the colloquial style. There are also words that occur to be technical and directorial remarks.

Authors and Affiliations

О. В. Скобнікова

Keywords

Related Articles

The structure of microtoponyms of Shcherbynivka and Petrivka settlements of the town Toretsk of Donetsk region

The structure of microtoponyms of Shcherbynivka and Petrivka settlements of the town Toretsk of Donetsk region is analyzed in the article. It is established that the majority of microtoponyms of the investigated area is...

The communicative aspect of the pragmatic meaning of antonyms of the Ukrainian language

The article deals with the communicative pragmems in the pragmatic component of the lexical meaning of antonyms in the Ukrainian language. The research singles out the main microcomponents of pragmatically relevant infor...

Color concepts in nominative models of English compounds: linguocognitive analysis.

The article offers the procedure of linguocognitive studies of color-nominating compounds in English. The procedure comprises two stages having their own actions. The first stage provides the analyses of English compound...

Elementary deviations of modern English postcolonial fictional text graphic arrangement

The article highlights elementary deviations of graphic arrangement of modern English postcolonial fictional texts mirroring the general trend of contemporary English authors to employ the visual way of text representati...

Principal approaches to the study of professional language

The article is devoted to the study of the multifaceted phenomenon of the professional language which should help to improve the process of forming the communicative skills of a specialist in a particular area of activit...

Download PDF file
  • EP ID EP521638
  • DOI -
  • Views 145
  • Downloads 0

How To Cite

О. В. Скобнікова (2018). Creating the own corpus of American film scripts. Науковий вісник Дрогобицького державного педагогічного університету імені Івана Франка. Серія: Філологічні науки (мовознавство), 9(), 204-207. https://europub.co.uk/articles/-A-521638