Large Language Models for Corporate Financial Distress Prediction: Overview and Exploration

Weiya Fu

doi:10.54097/2gaabd72

Authors

Weiya Fu

DOI:

https://doi.org/10.54097/2gaabd72

Keywords:

Financial distress prediction; big language models; textual features; semantic variable construction; modeling mechanisms; application challenges.

Abstract

With the increasing complexity of the business environment and the evolution of information disclosure tools, financial distress prediction (FDP) is gradually transforming from structured data-driven to semantic information fusion. Traditional models rely on financial ratios and statistical indicators, which make it difficult to capture risk propensity in “soft signals” such as management tone and textual metaphors. And the existing large language models (LLMs) provide a new perspective for the text-driven FDP system by the excellent semantic modeling and inference generation capabilities. This paper systematically sorts out the application path of LLMs in FDP by focusing on variable construction and model construction. Three types of representative text features, namely, emotional tone, semantic embedding, and generative variables, are summarized. The modeling mechanism analyzes LLMs as categorical predictive models and their fusion patterns in multimodal integrated systems. In addition, this work points out that there are still challenges such as scarce data labels, non-interpretable models, high cost of system deployment and lack of compliance mechanisms in existing studies, which urgently requires the evolution towards an intelligent early warning system with high credibility, transparency and adaptability under the synergistic promotion of multidisciplinary efforts. This work will provide a cutting-edge reference for constructing intelligent risk control systems and developing financial regulatory technology.

Downloads

Download data is not yet available.

References

[1] Zhao Jinxian, Ouenniche Jamal, De Smedt Johannes. Survey, classification and critical analysis of the literature on corporate bankruptcy and financial distress prediction. Machine Learning with Applications, 2024, 15: 100527.

[2] El Madou Kaoutar, Marso Said, El Kharrim Moad, et al. Evolutions in machine learning technology for financial distress prediction: A comprehensive review and comparative analysis. Expert Systems, 2024, 41(2): e13485.

[3] Li Jiawang, Wang Chongren. A deep learning approach of financial distress recognition combining text. Electronic Research Archive, 2023, 31(8): 4683–4707.

[4] Qiu Yue, He Jiabei, Chen Zhensong, et al. A novel semisupervised learning method with textual information for financial distress prediction. Journal of Forecasting, 2024, 43(7): 2478–2494.

[5] Dong Mengming Michael, Stratopoulos Theophanis C., Wang Victor Xiaoqi. A scoping review of ChatGPT research in accounting and finance. International Journal of Accounting Information Systems, 2024, 55: 100715.

[6] Beckmann Lars, Beckmeyer Heiner, Filippou Ilias, et al. Unusual Financial Communication: ChatGPT, Earnings Calls, and Financial Markets. 2025.

[7] Beaver William H. Financial Ratios as Predictors of Failure. Journal of Accounting Research, 1966, 4: 71-111.

[8] Ohlson James A. Financial Ratios and the Probabilistic Prediction. Journal of Accounting Research, 1980, 18(1): 109-131.

[9] Alaka Hafiz A., Oyedele Lukumon O., Owolabi Hakeem A., et al. Systematic review of bankruptcy prediction models: Towards a framework for tool selection. Expert Systems with Applications, 2018, 94: 164–184.

[10] Qu Yi, Quan Pei, Lei Minglong, et al. Review of bankruptcy prediction using machine learning and deep learning techniques. Procedia Computer Science, 2019, 162: 895–899.

[11] Laitinen Erkki K., Camacho-Miñano María-del-Mar, Muñoz-Izquierdo Nora. A review of the limitations of financial failure prediction research: Revisión de las limitaciones de la investigación sobre predicción de quiebras financieras. Revista de Contabilidad, 2023, 26(2): 255–273.

[12] Bushee Brian J., Gow Ian D., Taylor Daniel J. Linguistic Complexity in Firm Disclosures: Obfuscation or Information? Journal of Accounting Research, 2018, 56(1): 85–121.

[13] Xia Bolun (Namir), Rawte Vipula, Gupta Aparna, et al. FETILDA: Evaluation Framework for Effective Representations of Long Financial Documents. ACM Transactions on Knowledge Discovery from Data, 2024, 18(7): 1–27.

[14] Hajek Petr, Munk Michal. Speech emotion recognition and text sentiment analysis for financial distress prediction. Neural Computing and Applications, 2023, 35(29): 21463–21477.

[15] Huang Allen H., Wang Hui, Yang Yi. FINBERT : A Large Language Model for Extracting Information from Financial Text*. Contemporary Accounting Research, 2023, 40(2): 806–841.

[16] Delgadillo Josiel, Kinyua Johnson, Mutigwe Charles. FinSoSent: Advancing Financial Market Sentiment Analysis through Pretrained Large Language Models. Big Data and Cognitive Computing, 2024, 8(8): 87.

[17] Zhang Boyu, Yang Hongyang, Zhou Tianyu, et al. Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language Models. 4th ACM International Conference on AI in Finance, 2023, 349–356.

[18] Chen Yifei, Kelly Bryan, Xiu Dacheng. Expected Returns and Large Language Models. 2023.

[19] Cao Sean, Jiang Wei, Yang Baozhong, et al. How to Talk When a Machine Is Listening: Corporate Disclosure in the Age of AI. The Review of Financial Studies, 2023, 36(9): 3603–3642.

[20] Kim Alex G., Muhn Maximilian, Nikolaev Valeri V. Bloated Disclosures: Can ChatGPT Help Investors Process Information? SSRN Electronic Journal, 2023.

[21] Vamvourellis Dimitrios, Toth Máté, Bhagat Snigdha, et al. Company Similarity using Large Language Models. arXiv, 2023.

[22] Yang Stephen. Predictive Patentomics: Forecasting Innovation Success and Valuation with ChatGPT. 2023.

[23] Bybee J Leland. The Ghost in the Machine: Generating Beliefs with Large Language Models.

[24] Zarifhonarvar Ali. Experimental Evidence on Large Language Models. SSRN Electronic Journal, 2024.

[25] De Rosa Sara, Gringoli Francesco, Bellicini Gabriele. Hey ChatGPT, Is This Message Phishing? 22nd Mediterranean Communication and Computer Networking Conference (MedComNet), 2024, 1–10.

[26] Zou Yi, Shi Mengying, and Chen Zhongjie, et al. ESGReveal: An LLM-based approach for extracting structured data from ESG reports. Journal of Cleaner Production, 2025, 489: 144572.

[27] Hansen Stephen, Lambert Peter John, Bloom Nicholas, et al. NBER WORKING PAPER SERIES. 2023.

[28] Gao Chen, Lan Xiaochong, Li Nian, et al. Large Language Models Empowered Agent-based Modeling and Simulation: A Survey and Perspectives. arXiv, 2023.

[29] Zheng Zifan, Wang Yezhaohui, Huang Yuxin, et al. Attention heads of large language models. Patterns, 2025, 6(2): 101176.

[30] Yao Yifan, Duan Jinhao, Xu Kaidi, et al. A survey on large language model (LLM) security and privacy: The Good, The Bad, and The Ugly. High-Confidence Computing, 2024, 4(2): 100211.

[31] Pahune Saurabh, Chandrasekharan Manoj. Several Categories of Large Language Models (LLMs): A Short Survey. International Journal for Research in Applied Science and Engineering Technology, 2023, 11(7): 615–633.

[32] Sohail Shahab Saquib, Farhat Faiza, Himeur Yassine, et al. Decoding ChatGPT: A Taxonomy of Existing Research, Current Challenges, and Possible Future Directions. Journal of King Saud University - Computer and Information Sciences, 2023, 35(8): 101675.

[33] Feng Zifeng, Hu Gangqing, Li Bingxin, et al. unleashing the power of ChatGPT in finance research: opportunities and challenges. Financial Innovation, 2025, 11(1): 93.

[34] Desai Akshar Prabhu, Mallya Ganesh Satish, Luqman Mohammad, et al. Opportunities and Challenges of Generative-AI in Finance. In 2024 IEEE International Conference on Big Data, 2024, 4913–4920.