The KiezDeutsch-Korpus (KiDKo) has been developed by project B6 (PI: Heike Wiese) of the collaborative research centre Information Structure (SFB 632) at the University of Potsdam from 2008 to 2015. KiDKo is a multi-modal digital corpus of spontaneous discourse data from informal, oral peer group situations in multi- and monoethnic speech communities.
KiDKo contains audio data from self-recordings, with aligned transcriptions (i.e., at every point in a transcript, one can access the corresponding area in the audio file). The corpus provides parts-of-speech tags as well as an orthographically normalised layer (Rehbein & Schalowski 2013). Another annotation level provides information on syntactic chunks and topological fields.
KiDKo offers a new empirical resource for research in domains such as:
- Kiezdeutsch as a multiethnic dialect of German
- youth language in urban areas
- linguistic developments in contemporary German
- informal language use
KiDKo consists of two parts:
- the main corpus with spontaneous conversations between young people from a multiethnic community (Berlin-Kreuzberg)
- a complementary corpus with spontaneous conversations between young people from a monoethnic community with comparable socio-economic indicators (Berlin-Hellersdorf)
Go to KiDKo corpus
The "Oral and Written Text Production" corpus
In addition to the KiDKo three subcorpora are underway, which combine elicited data from Kreuzberg and Hellersdorf adolescents with data from Turkish learners of German from Turkey, which will allow further comparisons such as oral vs. written, spontaneous vs. elicited, and German as a first/second/foreign language:
- "Frog Story" corpus
- "Linguistic Repertoire" corpus
- "QUIS" corpus
The "Spracheinstellungen" corpus (KiDKo/E)
The corpus KiDKo/E ("KiDKo/Einstellungen") is also associated with KiDKo. It provides data on language attitudes, perceptions, and ideologies from the public discussion on Kiezdeutsch. KiDKo/E contains emails from 2009 through 2012 and readers’ comments posted between January and April 2012 on media websites.
The Corpus "Linguistic Landscapes"
Another associated corpus is KiDKo/LL ("KiDKo/Linguistic Landscapes"). Under the title "From the ’Hood With Love", this corpus assembles photos of written language productions in public space from the context of Kiezdeutsch, for instance love notes on walls, park benches, and playgrounds, graffiti in house entrances, and scribbled messages on toilet walls.
To the corpus KiDKo
- Rehbein, I., and Schalowski, S. (2013). STTS goes Kiez ‐ Experiments on Annotating and Tagging Urban Youth Language. Journal for Language Technology and Computational Linguistics 28: 199-227 (Themenheft "Das STTS-Tagset für Wortartentagging - Stand und Perspektiven").
- Rehbein, Ines; Schalowski, Sören, & Wiese, Heike (2013). The KiezDeutsch Korpus (KiDKo) Release 1.0. Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC),May 24-31, 2014. Reykjavik, Iceland. [pdf]
- Wiese, Heike; Freywald, Ulrike; Schalowski, Sören, & Mayr, Katharina (2012). Das KiezDeutsch-Korpus. Spontansprachliche Daten Jugendlicher aus urbanen Wohngebieten. Deutsche Sprache 40: 97-123.
People who are or have been involved in the process of building KiDKo and its supplementary corpora:
- Oliver Bunk
- Ulrike Freywald
- Sophie Hamm
- Banu Hueck
- Anne Junghans
- Jana Kiolbassa
- Julia Kostka
- Marlen Leisner
- Nadine Lestmann
- Katharina Mayr
- Tiner Özçelik
- Charlotte Pauli
- Gergana Popova
- Ines Rehbein
- Nadja Reinhold
- Franziska Rohland
- Sören Schalowski
- Kathleen Schumann
- Kristina Tjona Sommer
- Emiel Visser