Topical language generation using transformers

Rohola Zandie; Mohammad H. Mahoor

doi:10.1017/S1351324922000031

Topical language generation using transformers

Published online by Cambridge University Press: 04 February 2022

Rohola Zandie and

Mohammad H. Mahoor

Show author details

Rohola Zandie: Affiliation:
Department of Electrical and Computer Engineering, University of Denver, Denver, CO 80208, USA
Mohammad H. Mahoor*: Affiliation:
Department of Electrical and Computer Engineering, University of Denver, Denver, CO 80208, USA
*: *Corresponding author. E-mail: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Large-scale transformer-based language models (LMs) demonstrate impressive capabilities in open-text generation. However, controlling the generated text’s properties such as the topic, style, and sentiment is challenging and often requires significant changes to the model architecture or retraining and fine-tuning the model on new supervised data. This paper presents a novel approach for topical language generation (TLG) by combining a pre-trained LM with topic modeling information. We cast the problem using Bayesian probability formulation with topic probabilities as a prior, LM probabilities as the likelihood, and TLG probability as the posterior. In learning the model, we derive the topic probability distribution from the user-provided document’s natural structure. Furthermore, we extend our model by introducing new parameters and functions to influence the quantity of the topical features presented in the generated text. This feature would allow us to easily control the topical properties of the generated text. Our experimental results demonstrate that our model outperforms the state-of-the-art results on coherency, diversity, and fluency while being faster in decoding.

Keywords

Natural language generation Probabilistic model Topical language generation Transformers

Type: Article
Information: Natural Language Engineering , Volume 29 , Issue 2 , March 2023 , pp. 337 - 359

DOI: https://doi.org/10.1017/S1351324922000031 [Opens in a new window]
Copyright: © The Author(s), 2022. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Baheti, A., Ritter, A., Li, J. and Dolan, B. (2018). Generating more interesting responses in neural conversation models with distributional constraints. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium. Association for Computational Linguistics, pp. 3970–3980.CrossRef Google Scholar

Baldi, P. and Itti, L. (2010). Of bits and wows: A bayesian theory of surprise with applications to attention. Neural Networks 23(5), 649–666.CrossRef Google Scholar PubMed

Blei, D.M., Ng, A.Y. and Jordan, M.I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022.Google Scholar

Bowman, S.R., Vilnis, L., Vinyals, O., Dai, A.M., Jozefowicz, R. and Bengio, S. (2015). Generating sentences from a continuous space. In SIGNLL Conference on Computational Natural Language Learning (CONLL).Google Scholar

Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I. and Amodei, D. (2020). Language models are few-shot learners. In Advances in Neural Information Processing Systems 33 (NeurIPS 2020).Google Scholar

Correia, G.M., Niculae, V. and Martins, A.F.T. (2019). Adaptively sparse transformers. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China. Association for Computational Linguistics, pp. 2174–2184.CrossRef Google Scholar

Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K. and Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407.3.0.CO;2-9>CrossRef Google Scholar

Dethlefs, N. and Cuayáhuitl, H. (2015). Hierarchical reinforcement learning for situated natural language generation. Natural Language Engineering 21(3), 391–435.CrossRef Google Scholar

Dziri, N., Kamalloo, E., Mathewson, K.W. and Zaiane, O. (2018). Augmenting neural response generation with context-aware topical attention. In Proceedings of the First Workshop on NLP for Conversational AI.Google Scholar

Fu, Z., Tan, X., Peng, N., Zhao, D. and Yan, R. (2018). Style transfer in text: Exploration and evaluation. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32.CrossRef Google Scholar

Gage, P. (1994). A new algorithm for data compression. C Users Journal 12(2), 23–38.Google Scholar

Ghazvininejad, M., Shi, X., Priyadarshi, J. and Knight, K. (2017). Hafez: An interactive poetry generation system. In Proceedings of ACL 2017, System Demonstrations, pp. 43–48.CrossRef Google Scholar

Goodfellow, I. (2016). Nips 2016 tutorial: Generative adversarial networks. arXiv preprint arXiv:1701.00160.Google Scholar

Gopalakrishnan, K., Hedayatnia, B., Chen, Q., Gottardi, A., Kwatra, S., Venkatesh, A., Gabriel, R., Hakkani-Tür, D. and AI, A.A. (2019). Topical-chat: Towards knowledge-grounded open-domain conversations. In Proceedings of Interspeech 2019, pp. 1891–1895.CrossRef Google Scholar

Guo, J., Lu, S., Cai, H., Zhang, W., Yu, Y. and Wang, J. (2018). Long text generation via adversarial training with leaked information. In Thirty-Second AAAI Conference on Artificial Intelligence.CrossRef Google Scholar

Halko, N., Martinsson, P.-G. and Tropp, J.A. (2011). Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Review 53(2), 217–288.CrossRef Google Scholar

Hoffman, M., Bach, F.R. and Blei, D.M. (2010). Online learning for latent dirichlet allocation. In Advances in Neural Information Processing Systems, pp. 856–864.Google Scholar

Holtzman, A., Buys, J., Du, L., Forbes, M. and Choi, Y. (2020). The curious case of neural text degeneration. In International Conference on Learning Representations.Google Scholar

Holtzman, A., Buys, J., Forbes, M., Bosselut, A., Golub, D. and Choi, Y. (2018). Learning to write with cooperative discriminators. In Proceedings of ACL.CrossRef Google Scholar

Hu, Z., Yang, Z., Liang, X., Salakhutdinov, R. and Xing, E.P. (2017). Toward controlled generation of text. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, pp. 1587–1596.Google Scholar

Huang, J. (2005). Maximum likelihood estimation of dirichlet distribution parameters. CMU Technique Report.Google Scholar

Kannan, A., Kurach, K., Ravi, S., Kaufmann, T., Tomkins, A., Miklos, B., Corrado, G., Lukacs, L., Ganea, M., Young, P. and Ramavajjala, V. (2016). Smart reply: Automated response suggestion for email. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 955–964.CrossRef Google Scholar

Keskar, N.S., McCann, B., Varshney, L.R., Xiong, C. and Socher, R. (2019). Ctrl: A conditional transformer language model for controllable generation. arXiv preprint arXiv:1909.05858.Google Scholar

Lau, J.H., Baldwin, T. and Cohn, T. (2017). Topically driven neural language model. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada. Association for Computational Linguistics, pp. 355–365.CrossRef Google Scholar

Li, J., Jia, R., He, H. and Liang, P. (2018). Delete, retrieve, generate: A simple approach to sentiment and style transfer. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, Louisiana. Association for Computational Linguistics, pp. 1865–1874.CrossRef Google Scholar

Li, M., Roller, S., Kulikov, I., Welleck, S., Boureau, Y.-L., Cho, K. and Weston, J. (2020). Don’t say that! making inconsistent dialogue unlikely with unlikelihood training. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online. Association for Computational Linguistics, pp. 4715–4728.CrossRef Google Scholar

Malandrakis, N., Shen, M., Goyal, A., Gao, S., Sethi, A. and Metallinou, A. (2019a). Controlled text generation for data augmentation in intelligent artificial agents. arXiv preprint arXiv:1910.03487.CrossRef Google Scholar

Malandrakis, N., Shen, M., Goyal, A., Gao, S., Sethi, A. and Metallinou, A. (2019b). Controlled text generation for data augmentation in intelligent artificial agents. In Proceedings of the 3rd Workshop on Neural Generation and Translation, Hong Kong. Association for Computational Linguistics, pp. 90–98.CrossRef Google Scholar

Martins, A. and Astudillo, R. (2016). From softmax to sparsemax: A sparse model of attention and multi-label classification. In International Conference on Machine Learning, pp. 1614–1623.Google Scholar

Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S. and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pp. 3111–3119.Google Scholar

Mueller, J., Gifford, D. and Jaakkola, T. (2017). Sequence to better sequence: Continuous revision of combinatorial structures. In International Conference on Machine Learning, pp. 2536–2544.Google Scholar

Petroni, F., Rocktäschel, T., Lewis, P., Bakhtin, A., Wu, Y., Miller, A.H. and Riedel, S. (2019). Language models as knowledge bases? In EMNLP.CrossRef Google Scholar

Prabhumoye, S., Tsvetkov, Y., Salakhutdinov, R. and Black, A.W. (2018). Style transfer through back-translation. In ACL.CrossRef Google Scholar

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D. and Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9.Google Scholar

Reimers, N. and Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.CrossRef Google Scholar

Röder, M., Both, A. and Hinneburg, A. (2015). Exploring the space of topic coherence measures. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 399–408.CrossRef Google Scholar

See, A., Roller, S., Kiela, D. and Weston, J. (2019). What makes a good conversation? how controllable attributes affect human judgments. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota. Association for Computational Linguistics, pp. 1702–1723.CrossRef Google Scholar

Singh, A. and Palod, R. (2018). Sentiment transfer using seq2seq adversarial autoencoders. arXiv preprint arXiv:1804.04003.Google Scholar

Stahlberg, F., Cross, J. and Stoyanov, V. (2018). Simple fusion: Return of the language model. In WMT18.CrossRef Google Scholar

Tsallis, C. (1988). Possible generalization of boltzmann-gibbs statistics. Journal of Statistical Physics 52(1–2), 479–487.CrossRef Google Scholar

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. and Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems, pp. 5998–6008.Google Scholar

Welleck, S., Kulikov, I., Roller, S., Dinan, E., Cho, K. and Weston, J. (2020). Neural text generation with unlikelihood training. In International Conference on Learning Representations.Google Scholar

Xing, C., Wu, W., Wu, Y., Liu, J., Huang, Y., Zhou, M. and Ma, W.-Y. (2017). Topic aware neural response generation. In Thirty-First AAAI Conference on Artificial Intelligence.CrossRef Google Scholar

Xu, J., Sun, X., Zeng, Q., Ren, X., Zhang, X., Wang, H. and Li, W. (2018). Unpaired sentiment-to-sentiment translation: A cycled reinforcement learning approach. In ACL.CrossRef Google Scholar

Yu, L., Zhang, W., Wang, J. and Yu, Y. (2017). Seqgan: Sequence generative adversarial nets with policy gradient. In Thirty-First AAAI Conference on Artificial Intelligence.CrossRef Google Scholar

Zhang, Y., Ding, N. and Soricut, R. (2018). SHAPED: Shared-private encoder-decoder for text style adaptation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, Louisiana. Association for Computational Linguistics, pp. 1528–1538.CrossRef Google Scholar

Zhao, Y., Bi, V.W., Cai, D., Liu, X., Tu, K. and Shi, S. (2018). Language style transfer from non-parallel text with arbitrary styles. In International Conference on Learning Representations. rejected.Google Scholar

Article contents

Topical language generation using transformers

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests