Hostname: page-component-78c5997874-m6dg7 Total loading time: 0 Render date: 2024-11-07T20:24:55.846Z Has data issue: false hasContentIssue false

Translating text into pictographs

Published online by Cambridge University Press:  11 November 2015

VINCENT VANDEGHINSTE
Affiliation:
Centre for Computational Linguistics, University of Leuven, Blijde Inkomststraat 21 - bus 3315 B-3000, Leuven, Belgium e-mails: [email protected], [email protected], [email protected]
INEKE SCHUURMAN LEEN SEVENS
Affiliation:
Centre for Computational Linguistics, University of Leuven, Blijde Inkomststraat 21 - bus 3315 B-3000, Leuven, Belgium e-mails: [email protected], [email protected], [email protected]
FRANK VAN EYNDE
Affiliation:
Centre for Computational Linguistics, University of Leuven, Blijde Inkomststraat 21 - bus 3315 B-3000, Leuven, Belgium e-mails: [email protected], [email protected], [email protected]

Abstract

We describe and evaluate a text-to-pictograph translation system that is used in an online platform for Augmentative and Alternative Communication, which is intended for people who are not able to read and write, but who still want to communicate with the outside world. The system is set up to translate from Dutch into Sclera and Beta, two publicly available pictograph sets consisting of several thousands of pictographs each. We have linked large amounts of these pictographs to synsets or combinations of synsets of Cornetto, a lexical-semantic database for Dutch similar to WordNet. In the translation system, the Dutch input text undergoes shallow linguistic analysis and the synsets of the content words are looked up. The system looks for the nearest pictographs in the lexical-semantic database and displays the message into pictographs. We evaluated the system and results showed a large improvement over the baseline system which consisted of straightforward string-matching between the input text and the filenames of the pictographs.

Our system provides a clear improvement in the communication possibilities of illiterate people. Nevertheless there is room for further improvement.

Type
Articles
Copyright
Copyright © Cambridge University Press 2015 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Alm, N., Iwabuchi, M., Andreasen, P., and Nakamura, K. 2002. A multi-lingual augmentative communication system. In Univeral Access: Theoretical Perspectives, Practice and Experience, pp. 398408. Lecture Notes in Computer Science (LNCS), vol. 2615. Berlin: Springer.Google Scholar
Baker, C., Fillmore, C., and Lowe, J. 1998. The Berkeley FrameNet project. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (ACL/CoLing). Association for Compututational Linguistics, Montreal, Quebec, Canada, vol. 1, pp. 86–90.Google Scholar
Behrmann, M., and Byng, S. 1992. A cognitive approach to the neurorehabilitation of acquired language disorders. In Margolin, D. (ed.), Cognitive Neuropsychology in Clinical Practice, pp. 327–50. Oxford, UK: Oxford University Press.Google Scholar
Borman, A., Mihalcea, R., and Tarau, P. 2005. PicNet: augmenting semantic resources with pictorial representations. In Chklovski, T., Domingos, P., Lieberman, H., Mihalcea, R., and Singh, P. (eds.), Technical Report SS-05-03. Proceedings of the AAAI Spring Symposium on Knowledge Collection from Volunteer Contributors, pp. 17. Menlo Park, California: The AAAI Press.Google Scholar
Brants, Th. 2000. A statistical part-of-speech tagger. In Proceedings of the 6th Applied Natural Language Processing Conference (ANLP). Association for Computational Linguistics, Seattle, Washington, pp. 224–331.Google Scholar
Carney, R., and Levin, J., 2002. Pictorial illustration Still improve students’ learning from text. Educational Psychology Review 14 (1): 526.Google Scholar
Coyne, B., and Sproat, R., 2001. WordsEye: an automatic text-to-scene conversion system. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, Association for Computing Machinery (ACM), New York, pp. 487–96.Google Scholar
Davies, D. K., Stock, S. E., and Wehmeyer, M. L. 2001. Enhancing independent internet access for individuals with mental retardation through use of a specialized web browser: a pilot study. Education and Training in Mental Retardation and Developmental Disabilities 36 (1): 107–13.Google Scholar
Dawe, M. 2006. Desperately seeking simplicity: how young adults with cognitive disabilities and their families adopt assistive technologies. In Grinter, R., Rodden, T., Aoki, P., Cutrell, E., Jeffries, R., and Olson, G. (eds.), Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1143–52. New York, U.S.: Association for Computing Machinery (ACM).Google Scholar
Dechter, R., and Pearl, J. 1985. Generalized best-first search strategies and the optimality of A*. Journal of the ACM 32 (3): 505–36.Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L., 2009. ImageNet: a large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Institute of Electrical and Electronics Engineers, Miami, FL, pp. 248–55.Google Scholar
Doddington, G. 2002. Automatic evaluation of machine translation quality using N-gram co-occurrence statistics. In Proceedings of the 2nd International Conference on Human Language Technology Research, San Diego, California, pp. 138–45.Google Scholar
Goldberg, A., Zhu, X., Dyer, C. R., Eldawy, N., and Heng, L. 2008. Easy as ABC? Facilitating pictorial communication via semantically enhanced layout. In Proceedings of the 12th Conference on Computational Natural Language Learning (CoNLL), Coling 2008 Organizing Committee, Manchester, England, pp. 119–26.Google Scholar
Halácsy, P., Kornai, A., and Oravecz, C., 2007. HunPos – an open source trigram tagger. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, Association for Computational Linguistics, Prague, Czech Republic, pp. 209–12.Google Scholar
Hart, P. E., Nilsson, N. J., and Raphael, B. 1968. A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics SSC 4 (2): 100–7.Google Scholar
Joshi, D., Wang, J., and Li, J. 2006. The story picturing engine — a system for automatic text illustration. ACM Transactions on Multimedia Computing, Communications and Applications 2 (1): 122.Google Scholar
Keskinen, T., Heimonen, T., Turunen, M., Rajaniemi, J. P., and Kauppinen, S. 2012. SymbolChat: a flexible picture-based communication platform for users with intellectual disabilities. Interacting with Computers, vol. 24(5), pp. 374–86. Oxford, UK: Oxford University Press.Google Scholar
Koehn, P. 2004. Statistical significance tests for machine translation evaluation. In Lin, D., and Wu, D. (eds.) Proceedings of 2004 Conference on Empirical Methods on Natural Language Processing (EMNLP 2004), pp. 388–95. Association for Computational Linguistics, Barcelona, Spain: Association for Computational Linguistics.Google Scholar
Medhi, I., Sagar, A., and Toyama, K. 2006. Text-free user interfaces for illiterate and semiliterate users. In International Conference on Information and Communication Technologies and Development (ICTD), pp. 7282. Berkeley, CA: Institute of Electrical and Electronics Engineers.Google Scholar
Mihalcea, R., and Leong, C. W. 2009. Toward communicating simple sentences using pictorial representations. Machine Translation 22 (3): 153–73.Google Scholar
Miller, G. A. 1995. Wordnet: A lexical database for english. Communications of the ACM 38 (11): 3941.Google Scholar
Newell, A., and Gregor, P. 2000. ‘User sensitive inclusive design’ – in search of a new paradigm. In Proceedings of the Conference on Universal Usability (CUU’00), Association for Computing Machinery (ACM), Arlington, VA, pp. 39–44.Google Scholar
Oostdijk, N., Goedertier, W., Van Eynde, F., Boves, L., Martens, J. P., Moortgat, M., and Baayen, H. 2002. Experiences from the spoken dutch corpus project. In Rodríguez, M., and Araujo, C. (eds.), Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC), pp. 340–7. Las Palmas, Spain: European Language Resources Association.Google Scholar
Papineni, K., Roukos, S., Ward, T., and Zhu, W. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL). Philadelphia, PA, pp. 311–8.Google Scholar
Sevens, L., Vandeghinste, V., and Van Eynde, F. 2014. Improving the precision of synset links between Cornetto and Princeton WordNet. In Proceedings of the COLING Workshop on Lexical and Grammatical Resources for Language Processing (LG-LP 2014), Association for Computational Linguistics and Dublin City University, Dublin, Ireland, pp. 120–6.Google Scholar
Takasaki, T., and Mori, Y. 2007. Design and development of a pictogram communication system for children around the world. In Ishida, T., Fussell, S. R., and Vossen, P. T. J. M. (eds.) Intercultural Collaboration, pp. 193206. Berlin, Heidelberg: Springer.Google Scholar
Vandeghinste, V. 2002. Lexicon optimization: maximizing lexical coverage in Speech recognition through automated compounding. In Rodríguez, M. and Araujo, C. (eds.), Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC), pp. 1270–6. Las Palmas, Spain: European Language Resources Association.Google Scholar
Vandeghinste, V. 2012. Bridging the gap between pictographs and natural language. In Proceedings of the W3C/WAI Research and Development Working Group (RDWG) Online Symposium: Easy-to-Read on the Web. W3C Web Accessibility Initiative. http://www.w3.org/WAI/RD/2012/easy-to-read/paper14/ Google Scholar
Vandeghinste, V., and Schuurman, I. 2014. Linking pictographs to synsets: Sclera2Cornetto. In Calzolari, N., Choukri, K., Declerck, Th., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., and Piperidis, S. (eds.), Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14), pp. 3404–10, Reykjavik, Iceland: European Language Resources Association.Google Scholar
Van den Bosch, A., Busser, G. J., Daelemans, W., and Canisius, S. 2007. An efficient memory-based morphosyntactic tagger and parser for Dutch. In Van Eynde, F., Dirix, P., Schuurman, I., and Vandeghinste, V. (eds.), Selected Papers of the 17th Computational Linguistics in the Netherlands Meeting, pp. 99114, Utrecht: Landelijke Onderzoeksschool Taalkunde.Google Scholar
Van den Bosch, A., Schuurman, I., and Vandeghinste, V. 2006. Transferring PoS-tagging and lemmatization tools from spoken to written Dutch corpus development. In Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., and Tapias, D. (eds.), Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy: European Language Resources Association.Google Scholar
van der Vliet, H., Maks, I., Vossen, P., and Segers, R. 2010. The Cornetto database: Semantic issues in linking lexical units and synsets. In Dijkstra, A., Schoonheim, T. (eds.), Proceedings of the 14th EURALEX 2010 International Congress. pp. 477–83, July 6–10, 2010, Leeuwarden, the Netherlands: Fryske Akademy/De skriuwers.Google Scholar
Van Eynde, F. 2005. Part-of-Speech tagging en lemmatisering van het D-Coi corpus. Centrum voor Computerlinguïstiek. University of Leuven, Belgium. p. 88.Google Scholar
van Noord, G. 2006. At last parsing is now operational. In Mertens, P., Fairon, C., Dister, A., and Watrin, P. (eds.), Verbum Ex Machina. Actes de la 13e conference sur le Traitement Automatique des Langues Naturelles (TALN06), pp. 2042. Belgium: Presses universitaires de Louvain, Louvain-la-Neuve.Google Scholar
van Noord, G., Bouma, G., Van Eynde, F., de Kok, D., van der Linde, J., Schuurman, I., Tjong Kim Sang, E., and Vandeghinste, V. 2013. Large scale syntactic annotation of written Dutch: Lassy. In Spyns, P., and Odijk, J. (eds.), Essential Speech and Language Technology for Dutch: Resources, Tools and Applications, pp. 147–64. Berlin Heidelberg: Springer.Google Scholar
Vossen, P., Görög, A., Izquierdo, R., and Van den Bosch, A. 2012. DutchSemCor: targeting the ideal sense-tagged corpus. In Calzolari, N., Choukri, K., Declerck, T., Doğan, M., Maegaard, B., Mariani, J., Moreno, A., Odijk, J. and Piperidis, S. (eds.), Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12), pp. 584–9, Istanbul, Turkey: European Language Resources Association.Google Scholar
Vossen, P., Maks, I., Segers, R., and van der Vliet, H. 2008. Integrating lexical units, synsets, and ontology in the Cornetto Database. In Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., and Tapias, D. (eds.), Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC’08), pp. 1006–13, Marrakech, Morocco: European Language Resources Association.Google Scholar