Published online by Cambridge University Press: 13 September 2022
We develop an approach that combines the estimation of monthly firm-level expected returns with an assignment of firms to (possibly) latent groups, both based on observable characteristics, using machine learning principles with linear models. The best-performing methods are flexible two-stage sparse models that capture group-membership predictive relationships. Portfolios formed to exploit such group-varying predictions based on a parsimonious set of characteristics deliver economically meaningful returns with low turnover. We propose statistical tests based on nonparametric bootstrapping for our results, and detail how different characteristics may matter for different groups of firms, making comparisons to the existing literature.
We thank Jennifer Conrad (the editor) and Alberto Martín-Utrera (the referee) for their constructive comments. We are grateful to Panos Mavrokonstantis for excellent research assistance while he was a Senior Research Scientist at INSEAD. We also thank participants at the 13th Annual SoFiE Conference, the 3rd Future of Financial Information Conference, the inaugural Miami Herbert Winter Research Conference on ML and Business, the 2021 AFA PhD Poster Session, the 2020 European Winter Meetings of the Econometric Society, the 22nd INFER Annual Conference, the 9th Wharton-INSEAD Doctoral Consortium, and the INSEAD Accounting and Finance PhD seminar series, as well as Alex Chinco (discussant), Victor DeMiguel, Scott Murray (discussant), Joël Peress, Marcel Rindisbacher, Raman Uppal, Jinyuan Zhang, and Guofu Zhou for their helpful comments. A previous version of this article was circulated under the title “Modeling Heterogeneity in Firm-Level Return Predictability with Machine Learning.”