The Switchboard corpus
The Switchboard (SWBD-DA) corpus contains 1,155 five-minute conversations, orthographically transcribed in about 1.5 million word tokens. Each utterance in the corpus is segmented in `slash units’, defined as “maximally a sentence; slash units below the sentence level corresponds to parts of the narrative which are not sentential but which the annotator interprets as complete” (Meteer and Taylor, 1995). The corpus comprises 223,606 slash units, which are annotated with a communicative function tag according to a set of dialogue act types specified in the SWBD-DAMSL scheme (Jurafsky et al. 1997).
The Switchboard corpus is distributed by the Linguistic Data Consortium (LDC).
The Switchboard dialogues in the DialogBank are shown with their original SWBD-DAMSL annotations in tabular format and also with ISO 24617-2 annotations. (The Switchboard corpus is also available in NXT format, without in-line markups see Calhoun et al., 2010.)
- Original Annotation: ⭳