Switchboard dialogues

The Switchboard corpus

The Switchboard (SWBD-DA) corpus contains 1,155 five-minute conversations, orthographically transcribed in about 1.5 million word tokens. Each utterance in the corpus is segmented in `slash units’, defined as “maximally a sentence; slash units below the sentence level corresponds to parts of the narrative which are not sentential but which the annotator interprets as complete” (Meteer and Taylor, 1995). The corpus comprises 223,606 slash units, which are annotated with a communicative function tag according to a set of dialogue act types specified in the SWBD-DAMSL scheme (Jurafsky et al. 1997).

The Switchboard corpus is distributed by the Linguistic Data Consortium (LDC).

The Switchboard dialogues in the DialogBank are shown with their original SWBD-DAMSL annotations in tabular format and also with ISO 24617-2 annotations. (The Switchboard corpus is also available in NXT format, without in-line markups see Calhoun et al., 2010.)

Dialogue sw00-0004
Dialogue sw01-0105
Dialogue sw02-0224
Dialogue sw04-0411
Dialogue sw03-0304  -  (Note on the re-annotated dialogue: The dialogue has been re-segmented. In the re-annotated dialogue with ISO 24617-2 tags the text of newly introduced segments is represented in red.)