How to Use Lexical Density of Company Filings

Daniela Hanicova; Filip Kalús; Radovan Vojtko

Found an issue? Give us feedback

SSRN Electronic Jour...arrow_drop_down

SSRN Electronic Journal

Article . 2021 . Peer-reviewed

Data sources: Crossref

https://dx.doi.org/10.2139/ssr...

Other literature type

Data sources: Microsoft Academic Graph

How to Use Lexical Density of Company Filings

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type 01 Jan 2021 English Publisher:Elsevier BVJournal:SSRN Electronic Journal (eissn: 1556-5068,

Copyright policy )

Authors: Daniela Hanicova; Filip Kalús; Radovan Vojtko;

doi: 10.2139/ssrn.3921091

How to Use Lexical Density of Company Filings

- Summary
- Metrics

Abstract

This paper analyzes the application of natural language processing (NLP) on the 10-K and the 10-Q company reports. Using the Brain Language Metrics on Company Filings (BLMCF) dataset, which monitors numerous language metrics on 10-Ks and 10-Qs company reports, we analyze various lexical metrics such as lexical richness, lexical density, and specific density. In simple words, lexical richness says how many unique words are used by the author. The idea is that the more varied vocabulary the author has, the more complex the text is. Secondly, lexical density measures the structure and complexity of human communication in a text. A high lexical density indicates a large amount of information-carrying words. And lastly, specific density measures how dense the report's language is from a financial point of view. In other words, how many finance- related words are used in the text. Overall, we can say that this type of alternative data exhibits interesting results. Even though lexical richness produced the weakest results (of our strategies) when applied to the investment universe consisting of 500 stocks, it significantly improved when we expanded the investment universe to 3000 stocks. Moreover, the strategies based on the lexical density and specific density improved the Sharpe ratio even further. In the Last section, we combine the two metrics (Lexical density and Specific density) in one strategy. Applying both of these metrics to the investment universe with 500 stocks produces a Sharpe ratio of 0.688.

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now