The KiezDeutsch-Korpus

The KiezDeutsch-Korpus (KiDKo) has been developed by project B6 (PI: Heike Wiese) of the collaborative research centre Information Structure (SFB 632) at the University of Potsdam since 2008. KiDKo is a multi-modal digital corpus of spontaneous discourse data from informal, oral peer group situations in multi- and monoethnic speech communities. 

KiDKo contains audio data from self-recordings, with aligned transcriptions (i.e., at every point in a transcript, one can access the corresponding area in the audio file). The corpus provides parts-of-speech tags as well as an orthographically normalised layer. The syntactic annotation of the data (chunks and fopological fields) is work in progress.

KiDKo offers a new empirical resource for research in domains such as:

  • Kiezdeutsch as a multiethnic dialect of German
  • youth language in urban areas
  • linguistic developments in contemporary German
  • informal language use

KiDKo consists of two parts:

  • the main corpus with spontaneous conversations between young people from a multiethnic community (Berlin-Kreuzberg)
  • a complementary corpus with spontaneous conversations between young people from a monoethnic community with comparable socio-economic indicators (Berlin-Hellersdorf)

Go to KiDKo corpus

Supplementary corpora

The "Oral and Written Text Production" corpus

In addition to the KiDKo three subcorpora are underway, which combine elicited data from Kreuzberg and Hellersdorf adolescents with data from Turkish learners of German from Turkey, which will allow further comparisons such as oral vs. written, spontaneous vs. elicited, and German as a first/second/foreign language:

  1. "Frog Story" corpus
  2. "Linguistic Repertoire" corpus
  3. "QUIS" corpus

Go to Oral and written Text Production corpus

The "Spracheinstellungen" corpus (KiDKo/E)

The corpus KiDKo/E ("KiDKo/Einstellungen") is also associated with KiDKo. It provides data on language attitudes, perceptions, and ideologies from the public discussion on Kiezdeutsch. KiDKo/E contains emails from 2009 through 2012 and readers’ comments posted between January and April 2012 on media websites.

Go to Spracheinstellungen corpus

The Corpus "Linguistic Landscapes"

Another associated corpus is KiDKo/LL ("KiDKo/Linguistic Landscapes"). Under the title "From the ’Hood With Love", this corpus assembles photos of written language productions in public space from the context of Kiezdeutsch, for instance love notes on walls, park benches, and playgrounds, graffiti in house entrances, and scribbled messages on toilet walls.

To the corpus  KiDKo


Rehbein, Ines; Schalowski, Sören, & Wiese, Heike (2014). The KiezDeutsch Korpus (KiDKo) Release 1.0. Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC),May 24-31, 2014. Reykjavik, Iceland.  [pdf]  

Wiese, Heike; Freywald, Ulrike; Schalowski, Sören, & Mayr, Katharina (2012). Das KiezDeutsch-Korpus. Spontansprachliche Daten Jugendlicher aus urbanen Wohngebieten.  Deutsche Sprache  40: 97-123.

People who are or have been involved in the process of building KiDKo and its supplementary corpora:

Oliver Bunk
Ulrike Freywald
Sophie Hamm
Banu Hueck
Anne Junghans
Jana Kiolbassa
Julia Kostka
Marlen Leisner
Nadine Lestmann
Katharina Mayr
Tiner Özçelik
Charlotte Pauli
Gergana Popova
Ines Rehbein
Nadja Reinhold
Franziska Rohland
Sören Schalowski
Kathleen Schumann
Tjona Sommer
Emiel Visser