The BigBrother Corpus

The BigBrother Corpus is a speech corpus with recordings from the first season of the BigBrother show, sent on Norwegian television by TVNorge in the first half of 2001. The participants in BigBrother speak different dialects, but primarily they come from the east of Norway. They are aged 23-36 years.

The BigBrother Corpus contains audio and video recordings of almost all the 100 broadcasts that was shown on television, approx. 550 000 words. The recordings are linked to the orthographic transcriptions of what is said. The transcriptions are also tagged morphologically.

The first version of the BigBrother Corpus was created by the Text Laboratory in 2001 - 2002. A new project was implemented in the fall of 2007, and was completed in 2009 with a new search interface.

Big Brother - a unique speech corpus

In the BigBrother material, the participants work together, discuss, argue, quarrel, cries, laugh, shout, make love etc. as if they were common friends or lovers. In contrast to controlled recordings that are limited to interviews and dialogue, the BigBrother-material has conversations about all possible topics and within different genre. Sometimes strong feelings are in turn, which also can conceivably have an impact on the language. This is of course not the case for common speech corpus.

The corpus is available for research. Please contact the Text Laboratory if you need more information. Search for user name and password here.






Go to Norwegian Speech Corpora
Search in BigBrother