OVIS dialogues

The acronym OVIS stems from “Openbaar Vervoer InformatieSysteem” (Dutch for: Public Transportation Information System). This was a speech-input and speech-output system for providing information about train travel over the telephone.

The system was used experimentally for a short period, during which the OVIS corpus was collected at the University of Groningen — see www.let.rug.nl/vannoord/Ovis

Dialogue Material

Dialogue OVIS 01

Annotation DIAML: ⭳
Annotation DiAML-TabSW - gold standard: ⭳
Annotation DiAML-MultiTab - gold standard: ⭳
Annotation Structure - according to ISO 24617-2 abstract syntax: ⭳
Tokenization: ⭳
Segmentation - specification of functional segments: ⭳

Dialogue OVIS 05

Annotation DIAML: ⭳
Annotation DiAML-TabSW - gold standard: ⭳
Annotation DiAML-MultiTab - gold standard: ⭳
Tokenization: ⭳
Segmentation - specification of functional segments: ⭳

Dialogue OVIS 09

Annotation DIAML: ⭳
Annotation DiAML-TabSW - gold standard: ⭳
Annotation DiAML-MultiTab - gold standard: ⭳
Annotation Structure - according to ISO 24617-2 abstract syntax: ⭳
Tokenization: ⭳
Segmentation - specification of functional segments: ⭳

DIAMOND dialogues

The DIAMOND dialogues form a small corpus of problem-solving dialogues in which the user interacts through speech using high-quality microphones with a helpdesk in order to deal with problems in using a fax machine to which the user is new.

The dialogues were transcribed and annotated by groups of students using the DIT annotation scheme. Inter-annotator agreements were calculated and reported in the literature, see Geertzen, Petukhova & Bunt (2008).

The dialogues were collected by Jeroen Geertzen, Roser Morante, Hans van Dam, Yann Girard, Ielka van der Sluis, Barbara Suijkerbuijk, Rintse van der Werf and Harry Bunt; see Geertzen et al. (2004).

Before inclusion in the DialogBank the dialogues were re-annotated according to the ISO 24617-2 standard; the annotations are represented in the DiAML-MultiTab format. The functional segments in the MulTab representation refer to the tokenisation of the transcription, which is also made available.

Dialogue Material

Dialogue SIEJER172

Annotation DIAML: ⭳
Annotation DiAML-TabSW - gold standard: ⭳
Annotation DiAML-MultiTab - gold standard: ⭳
Annotation Structure - according to ISO 24617-2 abstract syntax: ⭳
Tokenization: ⭳

Dialogue SIEJER218

Annotation DIAML: ⭳
Annotation DiAML-MultiTab - gold standard: ⭳
Annotation Structure - according to ISO 24617-2 abstract syntax: ⭳
Tokenization: ⭳
Segmentation: ⭳

Dialogue HUGSYS041

Annotation DIAML: ⭳
Annotation DiAML-TabSW - gold standard: ⭳
Tokenization: ⭳
Segmentation: ⭳

Example Comparison of Representation Formats

More info soon.

Below is a short annotated fragment of dialogue ‘TRAINS 2’ represented in DiAML-XML format and the DiAML-MultiTab and DiAML-TabSW formats (click to enlarge). The dialogue was collected and annotated in the TRAINS project, and re-segmented and re-annotated according to ISO 24617-2 for inclusion in the DialogBank.

The following dialogue fragment is covered:

S: Hello, can I help you?
U: Yes, I have a problem I need to transport two tankers of OJ to Avon and three boxcars to Elmire, the Bananas must arrive in Elmire by nine p.m.
S: Okay

DiAML-MultiTab

DiAML-TabSW

DiAML-XML (coming soon)

DBOX dialogues

The DBOX dialogue corpus was collected and annotated at the University of Saarland, in Saarbrücken, in the context of Eureka project 7152 “D-Box, A generic dialog box for multilingual conversational applications”. This project’s main goal is to develop and test an innovative architecture for conversational agents whose purpose is to support multilingual collaboration. The project develops interactive games based on spoken natural language human-computer dialogues, in three European languages: English, French and German. The first D-Box game scenario is a quiz game, in which a player may ask any type of question, such as “What are you famous for?”, in order to guess the name of a famous person. For this game situation, dialogues have been collected in a Wizard-of-Oz setup with a human Wizard who simulates the system’s behaviour by acting according to a pre-defined script. For further details see Petukhova et al. (2014).

Dialogue material

Five annotated DBOX dialogues are included on the DialogBank. For more see the website of the D-Box project: see www.idiap.ch/project/d-box/

The annotations follow the ISO 24617-2 standard, making use of the possibility that this standard offers to add extra dimensions (with dimension-specific communicative functions); in particular, the dimension Contact Management has been inherited from the DIT++ annotation scheme, and an additional dimension called “Task Management” (also present in DAMSL) has been added for the annotation of utterances that discuss the rules of the quiz game. Moreover, three communicative functions are used that are not part of the ISO standard but that have been defined in DIT++, namely Dialogue Act Announcement (announcing the next dialogue act), Threat, and Pre-Closing (indicating the immanent closing of the dialogue), and the extra communicative function Congratulation has been added in order to account for those dialogue acts where a player is congratulated for correctly having guessed the identity of the famous person and thus having won the game.

Dialogue 2013-11-13-14-06-35-gl-diana

zip-archive: ⭳
Annotation DIAML: ⭳
Annotation DiAML-TabSW: ⭳
Annotation DiAML-MultiTab: ⭳
Tokenization: ⭳
Segmentation: ⭳
Dialogue Transcript: ⭳

Dialogue 2013-11-13-14-20-19-gl-rihanna

zip-archive: ⭳
Annotation DIAML: ⭳
Annotation DiAML-TabSW: ⭳
Annotation DiAML-MultiTab: ⭳
Tokenization: ⭳
Segmentation: ⭳
Dialogue Transcript: ⭳

Dialogue 2013-11-13-14-22-40-gl-venus

zip-archive: ⭳
Annotation DIAML: ⭳
Annotation DiAML-TabSW: ⭳
Annotation DiAML-MultiTab: ⭳
Tokenization: ⭳
Segmentation: ⭳
Dialogue Transcript: ⭳

Dialogue 2013-11-13-15-44-18-de-eleanor

zip-archive: ⭳
Annotation DIAML: ⭳
Annotation DiAML-TabSW: ⭳
Annotation DiAML-MultiTab: ⭳
Tokenization: ⭳
Segmentation: ⭳
Dialogue Transcript: ⭳

Dialogue 2013-11-13-17-04-58-nk-washington

zip-archive: ⭳
Annotation DIAML: ⭳
Annotation DiAML-TabSW: ⭳
Annotation DiAML-MultiTab: ⭳
Tokenization: ⭳
Segmentation: ⭳
Dialogue Transcript: ⭳

DIT++

The DIT++ annotation scheme is the result of two converging lines of research:

the development of a semantic theory of dialogue acts, called Dynamic Interpretation Theory (DIT);
the study of alternative systems of dialogue acts and dialogue annotation schemes, with the aim of defining a comprehensive taxonomy of dialogue acts, useful both for the design of natural-language based dialogue systems, and for the analysis and annotation of spoken and multimodal human dialogue.

Work in the former line resulted in the definition of a multidimensional taxonomy of dialogue acts for which a dynamic update semantics was defined (see Bunt 1989; 1995; 2000; 2013; 2014). Work in the latter line resulted in the definition of the DIT++ taxonomy and annotation scheme (Bunt 2009), which incorporated ideas from a variety of annotation schemes, notably DAMSL, SWBD-DAMSL, HCRC Map Task, Gothenburg IM, TRAINS, Verbmobil, and AMI. The DIT++ scheme Release 5.0 served as the basis for defining the ISO 24617-2 standard, and conversely benefited from the establishment of the latter.

The DIT++ taxonomy with the update semantics of its dialogue acts has in a preliminary version been applied in the multimodal dialogue PARADIME system (Keizer & Bunt, 2006; 2007) and is currently being applied in the multimodal Metalogue system.

The DIT++ annotation scheme was tested for its usability in the European project LIRICS and in PhD studies involving the manual annotation of dialogues in several European languages (see e.g. (Geertzen, 2005; 2006; Petukhova (2009; 2011). Petukhova & Bunt (2010) showed that the scheme can be applied in the automatic annotation of raw speech in human dialogue with very high accuracy.

A new version of the DIT++ scheme with some improvements and extensions has been released in April 2019 (Release 5.2) and is the basis for a proposed second edition of ISO 24617-2 (November 2019), which is currently under review.

For full documentation and explanation of the communicative functions, dimensions, qualifiers, and relations among dialogue acts see the DIT++ home page.

The DialogBank

Dialogues annotated according to the ISO 24617-2 standard

Author Archives: Harry Bunt

OVIS dialogues

DIAMOND dialogues

Example Comparison of Representation Formats

DBOX dialogues

Links

DIT++