Recognition of visual scene elements from a story text in Persian natural language

Mojdeh Hashemi-Namin; Mohammad Reza Jahed-Motlagh; Adel Torkaman Rahmani

doi:10.1017/S1351324922000390

Recognition of visual scene elements from a story text in Persian natural language

Published online by Cambridge University Press: 24 August 2022

Mojdeh Hashemi-Namin ,

Mohammad Reza Jahed-Motlagh and

Adel Torkaman Rahmani

Show author details

Mojdeh Hashemi-Namin: Affiliation:
Iran University of Science and Technology, Tehran, Iran
Mohammad Reza Jahed-Motlagh*: Affiliation:
Iran University of Science and Technology, Tehran, Iran
Adel Torkaman Rahmani: Affiliation:
Iran University of Science and Technology, Tehran, Iran
*: *Corresponding author. E-mail: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Text-to-scene conversion systems map natural language text to formal representations required for visual scenes. The difficulty involved in this mapping is one of the most critical challenges for developing these systems. The current study mapped Persian natural language text as the headmost system to a conceptual scene model. This conceptual scene model is an intermediate semantic representation between natural language and the visual scene and contains descriptions of visual elements of the scene. It will be used to produce meaningful animation based on an input story in this ongoing study. The mapping task was modeled as a sequential labeling problem, and a conditional random field (CRF) model was trained and tested for sequential labeling of scene model elements. To the best of the authors’ knowledge, no dataset for this task exists; thus, the required dataset was collected for this task. The lack of required off-the-shelf natural language processing modules and a significant error rate in the available corpora were important challenges to dataset collection. Some features of the dataset were manually annotated. The results were evaluated using standard text classification metrics, and an average accuracy of 85.7% was obtained, which is satisfactory.

Keywords

Text-To-Scene Conversion system Visual scene generation Conceptual scene model Persian natural language text Conditional random fields

Type: Article
Information: Natural Language Engineering , Volume 29 , Issue 3 , May 2023 , pp. 693 - 719

DOI: https://doi.org/10.1017/S1351324922000390 [Opens in a new window]
Copyright: © The Author(s), 2022. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Adorni, G., Di Manzo, M. and Giunchiglia, F. (1984). Natural language driven image generation. In Proceedings of the 10th International Conference on Computational Linguistics, COLING 1984, Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 495–500.Google Scholar

Alpaydin, E. (2014). Introduction to Machine Learning, 3rd Edn. Cambridge, MA: The MIT Press.Google Scholar

Arian, N. and Sabbagh, M. (2017). Semantic labeling of sentences in Persian language with supervised method. In Proceedings of the 22nd National CSI Computer Conference, CSICC 2017, Tehran, Iran. Computer Society of Iran, pp. 1–8.Google Scholar

Chang, A.X., Eric, M., Savva, M. and Manning, C.D. (2017). SceneSeer: 3D Scene Design with Natural Language. CoRR, pp. 1–10.Google Scholar

Chang, A.X., Monroe, W., Savva, M., Potts, C. and Manning, C.D. (2015). Text to 3D scene generation with rich lexical grounding. In The 53rd Annual Meeting of the Association for Computational Linguistics and The 7th International Joint Conference of the Asian Federation of Natural Language Processing, Beijing, China. Association for Computational Linguistics, pp. 1–10.CrossRef Google Scholar

Chang, A.X., Savva, M. and Manning, C.D. (2014a). Learning spatial knowledge for text to 3D scene generation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar. Association for Computational Linguistics, pp. 2028–2038.CrossRef Google Scholar

Chang, A.X., Savva, M. and Manning, C.D. (2014b). Semantic parsing for text to 3D scene generation. In Workshop on Semantic Parsing, Baltimore, Maryland, USA. Association for Computational Linguistics, pp. 17–21.Google Scholar

Coyne, B., Rambow, O., Hirschberg, J. and Sproat, R. (2010). Frame semantics in text-to-scene generation. In Knowledge-Based and Intelligent Information and Engineering Systems, Lecture Notes in Computer Science, vol. 6279. Springer Berlin Heidelberg, pp. 375–384.CrossRef Google Scholar

Coyne, B. and Sproat, R. (2001). WordsEye: an automatic text-to-scene conversion system. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 2001, New York, NY, USA. ACM, pp. 487–496.CrossRef Google Scholar

Fillmore, C. (1982). Frame semantics. Linguistics in the Morning Calm. Hanshin Publishing Company, pp. 111–137.Google Scholar

Finkel, J.R., Grenager, T. and Manning, C. (2005). Incorporating non-local information into information extraction systems by Gibbs sampling. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL 2005, Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 363–370.CrossRef Google Scholar

Fort, K., Adda, G. and Cohen, K.B. (2011). Amazon mechanical turk: Gold mine or coal mine? Computational Linguistics 37(2), 413–420.CrossRef Google Scholar

Frank, E., Hall, M.A. and Witten, I.H. (2016). The WEKA Workbench. Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniques”, 4th Edn. Morgan Kaufmann.Google Scholar

Glass, K. and Bangay, S. (2008). Automating the creation of 3D animation from annotated fiction text. In IADIS 2008: Proceedings of the International Conference on Computer Graphics and Visualization 2008, MM’10, Amsterdam, The Netherlands. IADIS Press, pp. 3–10. 00006.Google Scholar

Glass, K. and Bangay, S. (2009). A method for automatically creating 3D animated scenes from annotated fiction text. International Journal on Computer Science and Information System 4(2), 103–119.Google Scholar

Hassani, K. and Lee, W.-S. (2016). Visualizing natural language descriptions: A survey. ACM Computing Surveys (CSUR) 49(1), 1–34.CrossRef Google Scholar

Helfiandri, M.A., Zakhralativa Ruskanda, F. and Khodra, M.L. (2020). Generating Scene Descriptor from Indonesian Narrative Text. vol. CFP2013V-ART, Bandung, Indonesia. IEEE, pp. 1–6.Google Scholar

Hong, J.-H., Cho, S.-H., Jeon, J.-U. and Park, S.-Y. (2018). Development and evaluation of text-to-scene model for Korean language writing education as a Foreign language. Journal of The Korean Society for Computer Game 31(3), 63–70.Google Scholar

Iran Telecommunication Research Center (2014). Qur’anic Question and Answer Project. http://quranjooy.itrc.ac.ir.Google Scholar

Jackendoff, R. (1990). Semantic Structures . Current Studies in Linguistics Series, vol. 18. Cambridge, MA: MIT Press.Google Scholar

Jain, P., Bhavsar, R., Kumar, A., Pawar, B.V., Darbari, H. and Bhavsar, V.C. (2018). Tree adjoining grammar based parser for a Hindi text-to-scene conversion system. In 3rd International Conference for Convergence in Technology, I2CT, Pune, India. IEEE, pp. 1–7.CrossRef Google Scholar

Johansson, R., Nugues, P. and Williams, D. (2004). Carsim: A system to convert written accident reports into animated 3D scenes. In Proceedings of the 2nd Joint SAIS/SSLS Workshop Artificial Intelligence and Learning Systems, AILS-04. Department of Computer Science, Lund University, pp. 76–86.Google Scholar

Kayser, D. and Nouioua, F. (2009). From the textual description of an accident to its causes. Artificial Intelligence 173(12), 1154–1193.CrossRef Google Scholar

Kohavi, R. (1995). The power of decision tables. In Proceedings of the 8th European Conference on Machine Learning, ECML 95, Berlin, Heidelberg. Springer Berlin Heidelberg, pp. 174–189.CrossRef Google Scholar

Lafferty, J., McCallum, A. and Pereira, F.C. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In International Conference on Machine Learning (ICML), MA, USA. Morgan Kaufmann, pp. 282–289.Google Scholar

Landis, J.R. and Koch, G.G. (1977). The measurement of observer agreement for categorical data. Biometrics 33, 159–174.CrossRef Google Scholar PubMed

Lu, R.-Q. and Zhang, S.-M. (2002). From story to animation–full life cycle computer aided animation generation. Acta Automatica Sinica 28, 321–348.Google Scholar

Ma, M. (2006). Automatic Conversion of Natural Language to 3D Animation. PhD Thesis, University of Ulster.Google Scholar

Mesgar, M., Hajizade, M., Darrudi, E., Farhoodi, M., Mohamadzade, M., Alavi, T., Davoudi, M., Sarabi, Z. and Khalash, M. (2014). Semantic role labeling of Persian language based on dependency tree. Technical report, Iran Telecommunication Research Center, Tehran, Iran. sent to get published.Google Scholar

Miaoulis, G. and Plemenos, D. (2009). Intelligent Scene Modelling Information Systems . Studies in Computational Intelligence, vol. 181. Berlin, London: Springer. 00000.Google Scholar

Miller, G.A. (1995). WordNet: A lexical database for English. Communications of the ACM 38(11), 39–41.CrossRef Google Scholar

Nazari, M. (2006). Film production and play.Google Scholar

Okazaki, N. (2007). CRFsuite: A fast implementation of Conditional Random Fields (CRFs).Google Scholar

Palmer, M., Gildea, D. and Kingsbury, P. (2005). The proposition bank: A corpus annotated with semantic roles. Computational Linguistics Journal 31, 1.Google Scholar

Pandian, S.L. and Geetha, T.V. (2009). CRF models for tamil part of speech tagging and chunking. In Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy, Berlin, Heidelberg. Springer Berlin Heidelberg, pp. 11–22.CrossRef Google Scholar

Pardhi, V., Shah, K., Vaghasiya, J. and Hole, V. (2021). Generating a scene from text for smart education. In ICCICT, Mumbai, India. IEEE, pp. 1–6.Google Scholar

Quinlan, J.R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.Google Scholar

Qur’anic Question and Answer Project (2014a). Semantic role labeling manual of style. Technical report, Iran Telecommunication Research Center, Tehran, Iran.Google Scholar

Qur’anic Question and Answer Project (2014b). Syntactic labeling manual of style on the basis of dependency grammar in Persian. Technical report, Iran Telecommunication Research Center, Tehran, Iran.Google Scholar

Rouhizadeh, M. (2013). Collecting Semantic Information for Locations in the Knowledge Resource of a Text-to-Scene Conversion System . Master of Science, Oregon Health & Science University, Oregon, USA.Google Scholar

Ruppenhofer, J., Ellsworth, M., Petruck, M.R., Johnson, C.R. and Scheffczyk, J. (2016). FrameNet II: Extended Theory and Practice. Berkeley, CA: International Computer Science Institute.Google Scholar

Shamsfard, M. (2011). Challenges and open problems in Persian text processing. In 5th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics. Lecture Notes in Artificial Intelligence, vol. 8387. Poznan, Poland: Springer, pp. 65–69.Google Scholar

Shamsfard, M., Hesabi, A., Fadaei, H., Mansoory, N., Famian, A., Bagherbeigi, S., Fekri, E., Monshizadeh, M. and Assi, S.M. (2010a). Semi automatic development of farsnet; the persian wordnet. In Proceedings of 5th Global WordNet Conference, GWA2010, vol. 29, Mumbai, India. Indian Institute of Technology.Google Scholar

Shamsfard, M., Jafari, H.S. and Ilbeygi, M. (2010b). STeP-1: A set of fundamental tools for Persian text processing. In 7th Language Resources and Evaluation Conference, LREC 2010, Valletta, Malta. European Language Resources Association, pp. 859–865.Google Scholar

Surdeanu, M., Johansson, R., Meyers, A., Marquez, L. and Nivre, J. (2008). The CoNLL-2008 shared task on joint parsing of syntactic and semantic dependencies. In Proceedings of the Twelfth Conference on Computational Natural Language Learning (CoNLL 2008), Manchester, UK. Association for Computational Linguistics, pp. 159–177.CrossRef Google Scholar

Sutton, C. and McCallum, A. (2012). An introduction to conditional random fields. Foundations and Trends in Machine Learning 4(4), 267–373.CrossRef Google Scholar

Tabibzadeh, O. (2006). Verb Capacity and Fundamental Structure of Sentence in Current Persian. Tehran, Iran: Markaz Publishing.Google Scholar

Takahashi, N., Ramamonjisoa, D. and Ogata, T. (2007). A tool for supporting an animated movie making based on writing stories in xml. In Proceedings of IADIS International Conference Applied Computing, Salamanca, Spain. International Association for Development of the Information Society, pp. 405–409.Google Scholar

Ustalov, D. and Kudryavtsev, A. (2012). An ontology-based approach to text-to-picture synthesis systems. In Proceedings of the Second International Workshop on Concept Discovery in Unstructured Data (CDUD 2012) In Conjunction with the Tenth International Conference on Formal Concept Analysis (ICFCA 2012), vol. 871, Leuven, Belgium. Katholieke Universiteit Leuven, pp. 94–101.Google Scholar

Yadav, P., Sathe, K. and Chandak, M. (2020). Generating animations from instructional text. International Journal of Advanced Trends in Computer Science and Engineering 9(3), 3023–3027.Google Scholar

Zeng, X., Tan, M.-l. and Ren, S. (2016). The implementation of graphic constraints for automatic text to scene conversion. In International Conference on Artificial Intelligence and Computer Science, AICS 2016, Guilin, China. World Scientific Pubilshing Company, pp. 364–367.Google Scholar

Article contents

Recognition of visual scene elements from a story text in Persian natural language

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests