Dialogues in Dutch

The annotations of the dialogues listed here are all represented in the DiAML-MultiTab format. These corpora contain Dutch dialogues.

DIAMOND

  • Dialogues from the DIAMOND corpus with ISO 24617-2 annotations, and information about the corpus.

OVIS

  • Dialogues from the OVIS corpus with ISO 24617-2 annotations, and information about the corpus.

Dutch Map Task

Schiphol

Posted in Uncategorized

ISO 24617-2

ISO 24617-2 is an international standard established by the International Organisation for Standardisation ISO in 2012. The ISO 24617-2 annotation scheme was developed by a group of 10 researchers, consisting of Harry Bunt (Tilburg, project leader), Jan Alexandersson (Saarbrücken), Jean Carletta (Edinburgh), Jae-Woong Choe (Seoul), Alex Chengyu Fang (Hong Kong), Koiti Hasida (Tokio), Volha Petukhova (Tilburg; Saarbrucken), Andrei Popescu-Belis (Martigny), Claudia Soria (Pisa), and David Traum (Marina del Rey; Playa Vista).

Annotation scheme

The ISO 24617-2 scheme supports the marking up of spoken, written and multimodal dialogue with information about dialogue acts. A dialogue act has been defined as:

“Communicative activity of a dialogue participant, interpreted as having a certain communicative function and semantic content” (ISO 24617-2).

The notion of a dialogue act, which was first introduced in Bunt (1979), may be viewed as a computational and empirically-based variant of the classical speech act concept as introduced by Austin, Searle, and other language philosophers.

Dialogue acts are units in a dialogue that correspond to changes that the speaker intends to bring about in an addressee’s information as a result of the addressee understanding the speaker . A dialogue act has two main components: a communicative function and a semantic content. The communicative function species how the semantic content changes the information state of an addressee who understands the speaker’s communicative behaviour. Further components of a dialogue act are qualifiers for sentiment (such as happy, angry, surprised,…), certainty and conditionality; and dependence relations (such as question-answer), which are indispensable for determining the semantic content of a responsive dialogue act. Additionally, rhetorical relations between dialogue acts may be marked, indicating e.g. that one dialogue act motivates the performance of another one. ISO 24617-2 dialogue act annotation includes the marking up of the sender (or ‘speaker’) and the addressee(s) of the dialogue act, possible additional participants (such as an audience), the segment of discourse that expresses the dialogue act (the ‘functional segment’), the communicative function, the dimension (or type of semantic content), qualifiers, dependence relations, and rhetorical relations. The official ISO definitions of the standard’s communicative functions, dimensions, qualifiers, and dependence relations, in the form of ISO ‘data categories’, are documented in the unpublished paper Data categories for dialogue acts (Bunt, 2012).

The standard does not specify the rhetorical relations that may be used, but it is recommended to use the recently proposed standard ISO 24617-8 for this purpose, or at least a set of relations that is compatible with the list of ‘core’ relations defined in this new standard – see Bunt & Prasad (2016) for this list.

Posted in Uncategorized

Switchboard dialogues

The Switchboard corpus

The Switchboard (SWBD-DA) corpus contains 1,155 five-minute conversations, orthographically transcribed in about 1.5 million word tokens. Each utterance in the corpus is segmented in `slash units’, defined as “maximally a sentence; slash units below the sentence level corresponds to parts of the narrative which are not sentential but which the annotator interprets as complete” (Meteer and Taylor, 1995). The corpus comprises 223,606 slash units, which are annotated with a communicative function tag according to a set of dialogue act types specified in the SWBD-DAMSL scheme (Jurafsky et al. 1997).

The Switchboard corpus is distributed by the Linguistic Data Consortium (LDC).

The Switchboard dialogues in the DialogBank are shown with their original SWBD-DAMSL annotations in tabular format and also with ISO 24617-2 annotations. (The Switchboard corpus is also available in NXT format, without in-line markups see Calhoun et al., 2010.)

Dialogue sw00-0004
Dialogue sw01-0105
Dialogue sw02-0224
Dialogue sw04-0411
Dialogue sw03-0304  -  (Note on the re-annotated dialogue: The dialogue has been re-segmented. In the re-annotated dialogue with ISO 24617-2 tags the text of newly introduced segments is represented in red.)
Posted in Uncategorized

Map Task dialogues

Map Task dialogues (in English), collected and originally transcribed and annotated at the Human Communication Research Centre using the HCRC Map Task annotation scheme — see the website of the HCRC Map Task corpus.

Five dialogues from this corpus have been re-annotated using the ISO 24617-2 annotation scheme. The annotation of dialogue Q1EC5 has gold standard quality and is precisely in accordance with the ISO 24617-2 standard. The annotations of the other dialogues await a round of correcting in order to obtain gold standard quality.

Dialogue material (more to follow):

Dialogue q1ec5
Dialogue q1ec6
Dialogue q1ec7
Posted in Uncategorized