Hostname: page-component-78c5997874-mlc7c Total loading time: 0 Render date: 2024-11-19T12:41:33.210Z Has data issue: false hasContentIssue false

Automatic analysis of insurance reports through deep neural networks to identify severe claims

Published online by Cambridge University Press:  09 March 2021

Isaac Cohen Sabban*
Affiliation:
Sorbonne Université, CNRS, Laboratoire de Probabilités, Statistique et Modélisation, LPSM, 4 place Jussieu, F-75005 Paris, France Pacifica, Crédit Agricole Assurances, F-75015 Paris, France
Olivier Lopez*
Affiliation:
Sorbonne Université, CNRS, Laboratoire de Probabilités, Statistique et Modélisation, LPSM, 4 place Jussieu, F-75005 Paris, France
Yann Mercuzot
Affiliation:
Pacifica, Crédit Agricole Assurances, F-75015 Paris, France
*
*Corresponding author. E-mails: [email protected], [email protected]
*Corresponding author. E-mails: [email protected], [email protected]

Abstract

In this paper, we develop a methodology to automatically classify claims using the information contained in text reports (redacted at their opening). From this automatic analysis, the aim is to predict if a claim is expected to be particularly severe or not. The difficulty is the rarity of such extreme claims in the database, and hence the difficulty, for classical prediction techniques like logistic regression to accurately predict the outcome. Since data is unbalanced (too few observations are associated with a positive label), we propose different rebalance algorithm to deal with this issue. We discuss the use of different embedding methodologies used to process text data, and the role of the architectures of the networks.

Type
Original Research Paper
Copyright
© The Author(s), 2021. Published by Cambridge University Press on behalf of Institute and Faculty of Actuaries

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y. & Zheng, X. (2015). TensorFlow: large-scale machine learning on heterogeneous systems. Software available from tensorflow.org.Google Scholar
Aggarwal, C.C. & Zhai, C. (2012). Mining Text Data. Springer Science & Business Media, New York.CrossRefGoogle Scholar
Akosa, J. (2017). Predictive accuracy: a misleading performance measure for highly imbalanced data. In Proceedings of the SAS Global Forum.Google Scholar
Akritas, M.G. (2000). The central limit theorem under censoring. Bernoulli, 6(6), 11091120.CrossRefGoogle Scholar
Aloysius, N. & Geetha, M. (2017). A review on deep convolutional neural networks. In 2017 International Conference on Communication and Signal Processing (ICCSP) (pp. 0588–0592). IEEE.Google Scholar
Andersen, P.K., Borgan, O., Gill, R.D. & Keiding, N. (1998). Statistical Models Based on Counting Processes. Springer Science & Business Media.Google Scholar
Bengio, Y. (2012). Practical recommendations for gradient-based training of deep architectures. In Neural Networks: Tricks of the Trade (pp. 437–478). Springer.Google Scholar
Berry, M.W. & Castellanos, M. (2004). Survey of text mining. Computing Reviews, 45(9), 548.Google Scholar
Bojanowski, P., Grave, E., Joulin, A. & Mikolov, T. (2016). Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606.Google Scholar
Bojanowski, P., Grave, E., Joulin, A. & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135146.CrossRefGoogle Scholar
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123140.10.1007/BF00058655CrossRefGoogle Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O. & Kegelmeyer, W.P. (2002). Smote: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321357.CrossRefGoogle Scholar
Cheng, J., Dong, L. & Lapata, M. (2016). Long short-term memory-networks for machine reading. arXiv preprint arXiv:1601.06733.Google Scholar
Duchi, J., Hazan, E. & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12, 21212159.Google Scholar
Ellingsworth, M. & Sullivan, D. (2003). Text mining improves business intelligence and predictive modeling in insurance. Inf. Manag. 13(7), 42.Google Scholar
Friedman, J., Hastie, T. & Tibshirani, R. (2001). The Elements of Statistical Learning , Springer Series in Statistics, vol. 1. Springer, Berlin.Google Scholar
Gerber, G., Faou, Y.L., Lopez, O. & Trupin, M. (2020). The impact of churn on client value in health insurance, evaluation using a random forest under various censoring mechanisms. Journal of the American Statistical Association, 112.Google Scholar
Goutte, C. & Gaussier, E. (2005). A probabilistic interpretation of precision, recall and f-score, with implication for evaluation. In Springer (Ed.), European conference on information retrieval, vol. 3408 (pp. 345–359). Springer, Berlin, Heidelberg.CrossRefGoogle Scholar
Hastie, T., Tibshirani, R. & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Science & Business Media.CrossRefGoogle Scholar
Hesterberg, T. (2014). What teachers should know about the bootstrap: resampling in the undergraduate statistics curriculum. The American Statistician, 69, 371386.CrossRefGoogle Scholar
Hochreiter, S. & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 17351780.CrossRefGoogle ScholarPubMed
Joulin, A., Grave, E., Bojanowski, P., Douze, M., Jégou, H. & Mikolov, T. (2016a). Fasttext.zip: compressing text classification models. arXiv preprint arXiv:1612.03651.Google Scholar
Joulin, A., Grave, E., Bojanowski, P. & Mikolov, T. (2016b). Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759.CrossRefGoogle Scholar
Kaplan, E.L. & Meier, P. (1958). Nonparametric estimation from incomplete observations. Journal of the American Statistical Association, 53(282), 457481.CrossRefGoogle Scholar
Kim, Y. (2014). Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882.Google Scholar
Kline, D.M. & Berardi, V.L. (2005). Revisiting squared-error and cross-entropy functions for training neural network classifiers. Neural Computing & Applications, 14(4), 310318.CrossRefGoogle Scholar
Kolyshkina, I. & van Rooyen, M. (2006). Text mining for insurance claim cost prediction. In Data Mining (pp. 192–202). Springer.Google Scholar
Krizhevsky, A., Sutskever, I. & Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (pp. 10971105).Google Scholar
Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C.H. & Kang, J. (2020). BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4), 12341240.Google ScholarPubMed
Lin, T.-Y., Goyal, P., Girshick, R., He, K. & Dollár, P. (2017). Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (pp. 29802988).CrossRefGoogle Scholar
Lopez, O. (2019). A censored copula model for micro-level claim reserving. Insurance: Mathematics and Economics, 87, 114.Google Scholar
Lopez, O., Milhaud, X., Thérond, P.-E. (2016). Tree-based censored regression with applications in insurance. Electronic Journal of Statistics, 10(2), 2685–2716.CrossRefGoogle Scholar
Mikolov, T., Karafiát, M., Burget, L., Černocky, J. & Khudanpur, S. (2010). Recurrent neural network based language model. In Eleventh Annual Conference of the International Speech Communication Association.CrossRefGoogle Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S. & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems (pp. 31113119).Google Scholar
Panchapagesan, S., Sun, M., Khare, A., Matsoukas, S., Mandal, A., Hoffmeister, B. & Vitaladevuni, S. (2016). Multi-task learning and weighted cross-entropy for DNN-based keyword spotting. In Interspeech, vol. 9 (pp. 760–764).Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M. & Duchesnay, E. (2011). Scikit-learn: machine learning in Python. Journal of Machine Learning Research, 12, 28252830.Google Scholar
Pennington, J., Socher, R. & Manning, C.D. (2014). Glove: global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 15321543).CrossRefGoogle Scholar
Ramachandran, P., Zoph, B. & Le, Q.V. (2017). Searching for activation functions. arXiv preprint arXiv:1710.05941Google Scholar
Rong, X. (2014). word2vec parameter learning explained. arXiv preprint arXiv:1411.2738.Google Scholar
Ronneberger, O., Fischer, P. & Brox, T. (2015). U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (pp. 234241). Springer.Google Scholar
Rotnitzky, A. & Robins, J.M. (2014). Inverse probability weighting in survival analysis. Wiley StatsRef: Statistics Reference Online.Google Scholar
Rumelhart, D.E., Hinton, G.E. & Williams, R.J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533536.CrossRefGoogle Scholar
Saputro, A.R., Murfi, H. & Nurrohmah, S. (2019). Analysis of deep neural networks for automobile insurance claim prediction. In International Conference on Data Mining and Big Data (pp. 114123). Springer.CrossRefGoogle Scholar
Stute, W. (1995). The central limit theorem under random censorship. The Annals of Statistics, 23(2), 422439.CrossRefGoogle Scholar
Stute, W. (1996). Distributional convergence under random censorship when covariables are present. Scandinavian Journal of Statistics, 23(4), 461471.Google Scholar
Stute, W. (1999). Nonlinear censored regression. Statistica Sinica, 9(4), 10891102.Google Scholar
Stute, W. & Wang, J.-L. (1993). The strong law under random censorship. Annals of Statistics, 21(3), 15911607.CrossRefGoogle Scholar
Van Keilegom, I. & Akritas, M.G. (1999). Transfer of tail information in censored regression models. Annals of Statistics Peer-Reviewed Journal, 27(5), 17451784.Google Scholar
Verdikha, N., Adji, T. & Permanasari, A. (2018). Study of undersampling method: instance hardness threshold with various estimators for hate speech classification. IJITEE (International Journal of Information Technology and Electrical Engineering), 2, 3944.CrossRefGoogle Scholar
Wu, Q., Ye, Y., Zhang, H., Ng, M.K. & Ho, S.-S. (2014). Forestexter: an efficient random forest algorithm for imbalanced text categorization. Knowledge-Based Systems, 67, 105116.CrossRefGoogle Scholar
Wüthrich, M.V. (2018). Neural networks applied to chain–ladder reserving. European Actuarial Journal, 8(2), 407436.CrossRefGoogle Scholar