OVIS dialogues

The acronym OVIS stems from “Openbaar Vervoer InformatieSysteem” (Dutch for: Public Transportation Information System). This was a speech-input and speech-output system for providing information about train travel over the telephone.

The system was used experimentally for a short period, during which the OVIS corpus was collected at the University of Groningen — see www.let.rug.nl/vannoord/Ovis

Dialogue Material

Dialogue OVIS 01
  • Annotation DIAML:   ⭳  
  • Annotation DiAML-TabSW  -  gold standard:   ⭳  
  • Annotation DiAML-MultiTab  -  gold standard:   ⭳  
  • Annotation Structure  -  according to ISO 24617-2 abstract syntax:   ⭳  
  • Tokenization:   ⭳  
  • Segmentation  -  specification of functional segments:   ⭳  
Dialogue OVIS 05
Dialogue OVIS 09
  • Annotation DIAML:   ⭳  
  • Annotation DiAML-TabSW  -  gold standard:   ⭳  
  • Annotation DiAML-MultiTab  -  gold standard:   ⭳  
  • Annotation Structure  -   according to ISO 24617-2 abstract syntax:   ⭳  
  • Tokenization:   ⭳  
  • Segmentation  -  specification of functional segments:   ⭳  

Posted in Uncategorized

DIAMOND dialogues

The DIAMOND dialogues form a small corpus of problem-solving dialogues in which the user interacts through speech using high-quality microphones with a helpdesk in order to deal with problems in using a fax machine to which the user is new.

The dialogues were transcribed and annotated by groups of students using the DIT annotation scheme. Inter-annotator agreements were calculated and reported in the literature, see Geertzen, Petukhova & Bunt (2008).

The dialogues were collected by Jeroen Geertzen, Roser Morante, Hans van Dam, Yann Girard, Ielka van der Sluis, Barbara Suijkerbuijk, Rintse van der Werf and Harry Bunt; see Geertzen et al. (2004).

Before inclusion in the DialogBank the dialogues were re-annotated according to the ISO 24617-2 standard; the annotations are represented in the DiAML-MultiTab format. The functional segments in the MulTab representation refer to the tokenisation of the transcription, which is also made available.

Dialogue Material

Dialogue SIEJER172
Dialogue SIEJER218
Dialogue HUGSYS041
Posted in Uncategorized

Example Comparison of Representation Formats

More info soon.

Below is a short annotated fragment of dialogue ‘TRAINS 2’ represented in DiAML-XML format and the DiAML-MultiTab and DiAML-TabSW formats (click to enlarge). The dialogue was collected and annotated in the TRAINS project, and re-segmented and re-annotated according to ISO 24617-2 for inclusion in the DialogBank.

The following dialogue fragment is covered:

S: Hello, can I help you?
U: Yes, I have a problem I need to transport two tankers of OJ to Avon and three boxcars to Elmire, the Bananas must arrive in Elmire by nine p.m.
S: Okay

DiAML-MultiTab

DiAML-TabSW

DiAML-XML (coming soon)

Posted in Uncategorized

DBOX dialogues

The DBOX dialogue corpus was collected and annotated at the University of Saarland, in Saarbrücken, in the context of Eureka project 7152 “D-Box, A generic dialog box for multilingual conversational applications”. This project’s main goal is to develop and test an innovative architecture for conversational agents whose purpose is to support multilingual collaboration. The project develops interactive games based on spoken natural language human-computer dialogues, in three European languages: English, French and German. The first D-Box game scenario is a quiz game, in which a player may ask any type of question, such as “What are you famous for?”, in order to guess the name of a famous person. For this game situation, dialogues have been collected in a Wizard-of-Oz setup with a human Wizard who simulates the system’s behaviour by acting according to a pre-defined script. For further details see Petukhova et al. (2014).

Dialogue material

Five annotated DBOX dialogues are included on the DialogBank. For more see the website of the D-Box project: see www.idiap.ch/project/d-box/

The annotations follow the ISO 24617-2 standard, making use of the possibility that this standard offers to add extra dimensions (with dimension-specific communicative functions); in particular, the dimension Contact Management has been inherited from the DIT++ annotation scheme, and an additional dimension called “Task Management” (also present in DAMSL) has been added for the annotation of utterances that discuss the rules of the quiz game. Moreover, three communicative functions are used that are not part of the ISO standard but that have been defined in DIT++, namely Dialogue Act Announcement (announcing the next dialogue act), Threat, and Pre-Closing (indicating the immanent closing of the dialogue), and the extra communicative function Congratulation has been added in order to account for those dialogue acts where a player is congratulated for correctly having guessed the identity of the famous person and thus having won the game.

Dialogue 2013-11-13-14-06-35-gl-diana
Dialogue 2013-11-13-14-20-19-gl-rihanna
Dialogue 2013-11-13-14-22-40-gl-venus
Dialogue 2013-11-13-15-44-18-de-eleanor
Dialogue 2013-11-13-17-04-58-nk-washington

Posted in Uncategorized | Tagged

Links

Relevant sites:

Posted in Uncategorized

DIT++

The DIT++ annotation scheme is the result of two converging lines of research:

  1. the development of a semantic theory of dialogue acts, called Dynamic Interpretation Theory (DIT);
  2. the study of alternative systems of dialogue acts and dialogue annotation schemes, with the aim of defining a comprehensive taxonomy of dialogue acts, useful both for the design of natural-language based dialogue systems, and for the analysis and annotation of spoken and multimodal human dialogue.

Work in the former line resulted in the definition of a multidimensional taxonomy of dialogue acts for which a dynamic update semantics was defined (see Bunt 1989; 1995; 2000; 2013; 2014). Work in the latter line resulted in the definition of the DIT++ taxonomy and annotation scheme (Bunt 2009), which incorporated ideas from a variety of annotation schemes, notably DAMSL, SWBD-DAMSL, HCRC Map Task, Gothenburg IM, TRAINS, Verbmobil, and AMI. The DIT++ scheme Release 5.0 served as the basis for defining the ISO 24617-2 standard, and conversely benefited from the establishment of the latter.

The DIT++ taxonomy with the update semantics of its dialogue acts has in a preliminary version been applied in the multimodal dialogue PARADIME system (Keizer & Bunt, 2006; 2007) and is currently being applied in the multimodal Metalogue system.

The DIT++ annotation scheme was tested for its usability in the European project LIRICS and in PhD studies involving the manual annotation of dialogues in several European languages (see e.g. (Geertzen, 2005; 2006; Petukhova (2009; 2011). Petukhova & Bunt (2010) showed that the scheme can be applied in the automatic annotation of raw speech in human dialogue with very high accuracy.

A new version of the DIT++ scheme with some improvements and extensions has been released in April 2019 (Release 5.2) and is the basis for a proposed second edition of ISO 24617-2 (November 2019), which is currently under review.

For full documentation and explanation of the communicative functions, dimensions, qualifiers, and relations among dialogue acts see the DIT++ home page.

Posted in Uncategorized