Hostname: page-component-848d4c4894-tn8tq Total loading time: 0 Render date: 2024-07-05T13:41:40.612Z Has data issue: false hasContentIssue false

Topical language generation using transformers

Published online by Cambridge University Press:  04 February 2022

Rohola Zandie
Affiliation:
Department of Electrical and Computer Engineering, University of Denver, Denver, CO 80208, USA
Mohammad H. Mahoor*
Affiliation:
Department of Electrical and Computer Engineering, University of Denver, Denver, CO 80208, USA
*
*Corresponding author. E-mail: [email protected]

Abstract

Large-scale transformer-based language models (LMs) demonstrate impressive capabilities in open-text generation. However, controlling the generated text’s properties such as the topic, style, and sentiment is challenging and often requires significant changes to the model architecture or retraining and fine-tuning the model on new supervised data. This paper presents a novel approach for topical language generation (TLG) by combining a pre-trained LM with topic modeling information. We cast the problem using Bayesian probability formulation with topic probabilities as a prior, LM probabilities as the likelihood, and TLG probability as the posterior. In learning the model, we derive the topic probability distribution from the user-provided document’s natural structure. Furthermore, we extend our model by introducing new parameters and functions to influence the quantity of the topical features presented in the generated text. This feature would allow us to easily control the topical properties of the generated text. Our experimental results demonstrate that our model outperforms the state-of-the-art results on coherency, diversity, and fluency while being faster in decoding.

Type
Article
Copyright
© The Author(s), 2022. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Baheti, A., Ritter, A., Li, J. and Dolan, B. (2018). Generating more interesting responses in neural conversation models with distributional constraints. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium. Association for Computational Linguistics, pp. 39703980.CrossRefGoogle Scholar
Baldi, P. and Itti, L. (2010). Of bits and wows: A bayesian theory of surprise with applications to attention. Neural Networks 23(5), 649666.CrossRefGoogle ScholarPubMed
Blei, D.M., Ng, A.Y. and Jordan, M.I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research 3, 9931022.Google Scholar
Bowman, S.R., Vilnis, L., Vinyals, O., Dai, A.M., Jozefowicz, R. and Bengio, S. (2015). Generating sentences from a continuous space. In SIGNLL Conference on Computational Natural Language Learning (CONLL).Google Scholar
Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I. and Amodei, D. (2020). Language models are few-shot learners. In Advances in Neural Information Processing Systems 33 (NeurIPS 2020).Google Scholar
Correia, G.M., Niculae, V. and Martins, A.F.T. (2019). Adaptively sparse transformers. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China. Association for Computational Linguistics, pp. 21742184.CrossRefGoogle Scholar
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K. and Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391407.3.0.CO;2-9>CrossRefGoogle Scholar
Dethlefs, N. and Cuayáhuitl, H. (2015). Hierarchical reinforcement learning for situated natural language generation. Natural Language Engineering 21(3), 391435.CrossRefGoogle Scholar
Dziri, N., Kamalloo, E., Mathewson, K.W. and Zaiane, O. (2018). Augmenting neural response generation with context-aware topical attention. In Proceedings of the First Workshop on NLP for Conversational AI.Google Scholar
Fu, Z., Tan, X., Peng, N., Zhao, D. and Yan, R. (2018). Style transfer in text: Exploration and evaluation. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32.CrossRefGoogle Scholar
Gage, P. (1994). A new algorithm for data compression. C Users Journal 12(2), 2338.Google Scholar
Ghazvininejad, M., Shi, X., Priyadarshi, J. and Knight, K. (2017). Hafez: An interactive poetry generation system. In Proceedings of ACL 2017, System Demonstrations, pp. 4348.CrossRefGoogle Scholar
Goodfellow, I. (2016). Nips 2016 tutorial: Generative adversarial networks. arXiv preprint arXiv:1701.00160.Google Scholar
Gopalakrishnan, K., Hedayatnia, B., Chen, Q., Gottardi, A., Kwatra, S., Venkatesh, A., Gabriel, R., Hakkani-Tür, D. and AI, A.A. (2019). Topical-chat: Towards knowledge-grounded open-domain conversations. In Proceedings of Interspeech 2019, pp. 18911895.CrossRefGoogle Scholar
Guo, J., Lu, S., Cai, H., Zhang, W., Yu, Y. and Wang, J. (2018). Long text generation via adversarial training with leaked information. In Thirty-Second AAAI Conference on Artificial Intelligence.CrossRefGoogle Scholar
Halko, N., Martinsson, P.-G. and Tropp, J.A. (2011). Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Review 53(2), 217288.CrossRefGoogle Scholar
Hoffman, M., Bach, F.R. and Blei, D.M. (2010). Online learning for latent dirichlet allocation. In Advances in Neural Information Processing Systems, pp. 856864.Google Scholar
Holtzman, A., Buys, J., Du, L., Forbes, M. and Choi, Y. (2020). The curious case of neural text degeneration. In International Conference on Learning Representations.Google Scholar
Holtzman, A., Buys, J., Forbes, M., Bosselut, A., Golub, D. and Choi, Y. (2018). Learning to write with cooperative discriminators. In Proceedings of ACL.CrossRefGoogle Scholar
Hu, Z., Yang, Z., Liang, X., Salakhutdinov, R. and Xing, E.P. (2017). Toward controlled generation of text. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, pp. 15871596.Google Scholar
Huang, J. (2005). Maximum likelihood estimation of dirichlet distribution parameters. CMU Technique Report.Google Scholar
Kannan, A., Kurach, K., Ravi, S., Kaufmann, T., Tomkins, A., Miklos, B., Corrado, G., Lukacs, L., Ganea, M., Young, P. and Ramavajjala, V. (2016). Smart reply: Automated response suggestion for email. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 955964.CrossRefGoogle Scholar
Keskar, N.S., McCann, B., Varshney, L.R., Xiong, C. and Socher, R. (2019). Ctrl: A conditional transformer language model for controllable generation. arXiv preprint arXiv:1909.05858.Google Scholar
Lau, J.H., Baldwin, T. and Cohn, T. (2017). Topically driven neural language model. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada. Association for Computational Linguistics, pp. 355365.CrossRefGoogle Scholar
Li, J., Jia, R., He, H. and Liang, P. (2018). Delete, retrieve, generate: A simple approach to sentiment and style transfer. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, Louisiana. Association for Computational Linguistics, pp. 18651874.CrossRefGoogle Scholar
Li, M., Roller, S., Kulikov, I., Welleck, S., Boureau, Y.-L., Cho, K. and Weston, J. (2020). Don’t say that! making inconsistent dialogue unlikely with unlikelihood training. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online. Association for Computational Linguistics, pp. 47154728.CrossRefGoogle Scholar
Malandrakis, N., Shen, M., Goyal, A., Gao, S., Sethi, A. and Metallinou, A. (2019a). Controlled text generation for data augmentation in intelligent artificial agents. arXiv preprint arXiv:1910.03487.CrossRefGoogle Scholar
Malandrakis, N., Shen, M., Goyal, A., Gao, S., Sethi, A. and Metallinou, A. (2019b). Controlled text generation for data augmentation in intelligent artificial agents. In Proceedings of the 3rd Workshop on Neural Generation and Translation, Hong Kong. Association for Computational Linguistics, pp. 9098.CrossRefGoogle Scholar
Martins, A. and Astudillo, R. (2016). From softmax to sparsemax: A sparse model of attention and multi-label classification. In International Conference on Machine Learning, pp. 16141623.Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S. and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pp. 31113119.Google Scholar
Mueller, J., Gifford, D. and Jaakkola, T. (2017). Sequence to better sequence: Continuous revision of combinatorial structures. In International Conference on Machine Learning, pp. 25362544.Google Scholar
Petroni, F., Rocktäschel, T., Lewis, P., Bakhtin, A., Wu, Y., Miller, A.H. and Riedel, S. (2019). Language models as knowledge bases? In EMNLP.CrossRefGoogle Scholar
Prabhumoye, S., Tsvetkov, Y., Salakhutdinov, R. and Black, A.W. (2018). Style transfer through back-translation. In ACL.CrossRefGoogle Scholar
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D. and Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9.Google Scholar
Reimers, N. and Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.CrossRefGoogle Scholar
Röder, M., Both, A. and Hinneburg, A. (2015). Exploring the space of topic coherence measures. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 399408.CrossRefGoogle Scholar
See, A., Roller, S., Kiela, D. and Weston, J. (2019). What makes a good conversation? how controllable attributes affect human judgments. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota. Association for Computational Linguistics, pp. 17021723.CrossRefGoogle Scholar
Singh, A. and Palod, R. (2018). Sentiment transfer using seq2seq adversarial autoencoders. arXiv preprint arXiv:1804.04003.Google Scholar
Stahlberg, F., Cross, J. and Stoyanov, V. (2018). Simple fusion: Return of the language model. In WMT18.CrossRefGoogle Scholar
Tsallis, C. (1988). Possible generalization of boltzmann-gibbs statistics. Journal of Statistical Physics 52(1–2), 479487.CrossRefGoogle Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. and Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems, pp. 59986008.Google Scholar
Welleck, S., Kulikov, I., Roller, S., Dinan, E., Cho, K. and Weston, J. (2020). Neural text generation with unlikelihood training. In International Conference on Learning Representations.Google Scholar
Xing, C., Wu, W., Wu, Y., Liu, J., Huang, Y., Zhou, M. and Ma, W.-Y. (2017). Topic aware neural response generation. In Thirty-First AAAI Conference on Artificial Intelligence.CrossRefGoogle Scholar
Xu, J., Sun, X., Zeng, Q., Ren, X., Zhang, X., Wang, H. and Li, W. (2018). Unpaired sentiment-to-sentiment translation: A cycled reinforcement learning approach. In ACL.CrossRefGoogle Scholar
Yu, L., Zhang, W., Wang, J. and Yu, Y. (2017). Seqgan: Sequence generative adversarial nets with policy gradient. In Thirty-First AAAI Conference on Artificial Intelligence.CrossRefGoogle Scholar
Zhang, Y., Ding, N. and Soricut, R. (2018). SHAPED: Shared-private encoder-decoder for text style adaptation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, Louisiana. Association for Computational Linguistics, pp. 15281538.CrossRefGoogle Scholar
Zhao, Y., Bi, V.W., Cai, D., Liu, X., Tu, K. and Shi, S. (2018). Language style transfer from non-parallel text with arbitrary styles. In International Conference on Learning Representations. rejected.Google Scholar