Guidelines

If you’re interested in annotating dialogues according to the ISO 24617-2 standard, please first consult the document “Guidelines for using ISO standard 24617-2” (version of February 2017), which contains detailed directions for how to use the ISO standard.

Create your own ISO 24617-2 annotation using the human-friendly DiAML-TabSW representation format!

The easiest way to create your own ISO 24617-2 annotated dialogues may be in the DiAML-TabSW format. This format does not require the use of any specific annotation tools (such as ANVIL or ELAN) and is easier to use than the DiAML-XML format; it is in particular more convenient for inspecting and correcting an annotation. Below we describe step by step how to create an annotation in the DiAML-TabSW format.

A fully ISO compliant stand-off annotation in this format consists of three files; (1) the segmentation file that specifies the dialogue as a sequence of tokens, (2) the segmentation file that specifies the functional segments in terms of tokens; and (3) the annotation file that contains the actual annotations.

The following dialogue fragment is used to illustrate this:

A: Jimmy, so how do you get most of your news?

B: Well, I kind of, uh, I watch the, uh, national news every day.

The tokenization file

Step 1: Create a text file (.txt extension) that stores the (non)verbal primary data that constitute the dialogue, also known as the “tokenization file”. This file lists, depending on your wants and needs, all the (non)verbal data such as words, inbreaths, and coughs, according to their occurrence in the dialogue. The tokenization file for the above dialogue fragment would look like 2013-11-13-14-06-35-gl-diana_tokenization.

The segmentation file

Step 2: Create a text file (.txt extension) that stores the functional segments that you have identified in terms of their tokens of primary data, also known as the “segmentation file”. This file lists all the functional segments in the dialogue. The segmentation file for the above dialogue fragment would look like 2013-11-13-14-06-35-gl-diana_segmentation.

The annotation file

Step 3: Create an Excel file (.xlsx extension) that stores the annotation in terms of dialogue acts corresponding to the functional segments. The annotation file in DiAML-TabSW format for the above dialogue fragment would look like 2013-11-13-14-06-35-gl-diana_DiAML-TabSW.

Getting started: DiAML-TabSW template

Use this template to get you started annotating in the DiAML-TabSW format! See section 3.2.3 in Annotation Representations and the Construction of the DialogBank for guidelines on how to include ISO 24617-2 concepts such as qualifiers and rhetorical relations in the TabSW format. See also the representation formats page for more information on the DiAML representation formats and conversions between them.

Posted in Uncategorized