An open question in research on multimodal figuration is how to mitigate the analyst’s bias in identifying and interpreting metaphor and metonymy; an issue that determines the generalizability of the findings. Little is known about the causes that motivate different annotations. Inter-rater reliability tests are useful to investigate the sources of variation in annotations by independent researchers that can help inform and refine protocols.
Inspired by existing procedures for verbal, visual, and filmic metaphor identification, we formulated instructions to identify multimodal metaphor and metonymy and tested it against a corpus of 21 generic advertisements and 21 genre-specific advertisements (mobile phones). Two independent researchers annotated the advertisements in six rounds. A joint discussion followed each round to consider conflicting annotations and refine the protocol for the ensuing round.
By examining the evolution of inter-rater reliability results, we found that (1) we reached similar levels of agreement for the identification of metaphor and metonymy, although converging on the interpretation of metonymy was more difficult; (2) some genre specificities made it easier to agree on the annotations for mobile advertisements than for the general advertisements; and (3) there was a consistent increase in the kappa scores reaching substantial agreement by the sixth round.