The BigBrother Corpus

The BigBrother Corpus is a speech corpus with recordings from the first season of the BigBrother show, sent on Norwegian television by TVNorge in the first half of 2001. The participants in BigBrother speak different dialects, but primarily they come from the east of Norway. They are aged 23-36 years.

The BigBrother Corpus contains audio and video recordings of almost all the 100 broadcasts that was shown on television, 440 300 tokens. The recordings are linked to the orthographic transcriptions of what is said. The transcriptions are also tagged morphologically.

The first version of the BigBrother Corpus was created by the Text Laboratory in 2001 - 2002. A new project was implemented in the fall of 2007, and was completed in 2009 with a new search interface.

In 2023 the corpus was transferred in to the newest version of Glossa, a search and post-processing tool developed by the Text Laboratory.
Log in with Feide or CLARIN. Contact us if you need another login alternative.

Search in BigBrother

Download the transcriptions from Github:

Big Brother - a unique speech corpus

In the BigBrother material, the participants work together, discuss, argue, quarrel, cries, laugh, shout, make love etc. as if they were common friends or lovers. In contrast to controlled recordings that are limited to interviews and dialogue, the BigBrother-material has conversations about all possible topics and within different genre. Sometimes strong feelings are in turn, which also can conceivably have an impact on the language. This is of course not the case for common speech corpus.