
arXiv: 2504.15683
Recent advancements in information availability and computational capabilities have transformed the analysis of annual reports, integrating traditional financial metrics with insights from textual data. To extract valuable insights from this wealth of textual data, automated review processes, such as topic modeling, are crucial. This study examines the effectiveness of BERTopic, a state-of-the-art topic model relying on contextual embeddings, for analyzing Item 7 and Item 7A of 10-K filings from S&P 500 companies (2016-2022). Moreover, we introduce FinTextSim, a finetuned sentence-transformer model optimized for clustering and semantic search in financial contexts. Compared to all-MiniLM-L6-v2, the most widely used sentence-transformer, FinTextSim increases intratopic similarity by 81% and reduces intertopic similarity by 100%, significantly enhancing organizational clarity. We assess BERTopic's performance using embeddings from both FinTextSim and all-MiniLM-L6-v2. Our findings reveal that BERTopic only forms clear and distinct economic topic clusters when paired with FinTextSim's embeddings. Without FinTextSim, BERTopic struggles with misclassification and overlapping topics. Thus, FinTextSim is pivotal for advancing financial text analysis. FinTextSim's enhanced contextual embeddings, tailored for the financial domain, elevate the quality of future research and financial information. This improved quality of financial information will enable stakeholders to gain a competitive advantage, streamlining resource allocation and decision-making processes. Moreover, the improved insights have the potential to leverage business valuation and stock price prediction models.
FOS: Computer and information sciences, Computer Science - Machine Learning, General Finance, Computer Science - Computation and Language, I.5.1, General Economics (econ.GN), J.4, I.2.7, 68T50, General Economics, Machine Learning (cs.LG), Machine Learning, FOS: Economics and business, I.2.7; I.5.1; J.4, Computation and Language, Quantitative Finance - General Finance, General Finance (q-fin.GN), Computation and Language (cs.CL), Economics - General Economics
FOS: Computer and information sciences, Computer Science - Machine Learning, General Finance, Computer Science - Computation and Language, I.5.1, General Economics (econ.GN), J.4, I.2.7, 68T50, General Economics, Machine Learning (cs.LG), Machine Learning, FOS: Economics and business, I.2.7; I.5.1; J.4, Computation and Language, Quantitative Finance - General Finance, General Finance (q-fin.GN), Computation and Language (cs.CL), Economics - General Economics
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
