Using Machine Learning to Uncover the Semantics of Concepts: How Well Do Typicality Measures Extracted from a BERT Text Classifier Match Human Judgments of Genre Typicality?

descriptionPublicationkeyboard_double_arrow_right Article 01 Jan 2023 Spain Publisher:Society for Sociological ScienceJournal:Sociological Science, volume 10, pages 82-117 (eissn: 2330-6696,

Copyright policy )Funded by:EC | InfoSampCollectJgmt

Authors: Le Mens, Gaël; Kovács, Balázs; Hannan, Michael; Pros, Guillem;

doi: 10.15195/v10.a3

handle: 10230/56063

Using Machine Learning to Uncover the Semantics of Concepts: How Well Do Typicality Measures Extracted from a BERT Text Classifier Match Human Judgments of Genre Typicality?

- Summary
- Subjects
- Metrics

Abstract

Social scientists have long been interested in understanding the extent to which the typicalities of an object in concepts relate to its valuations by social actors. Answering this question has proven to be challenging because precise measurement requires a feature-based description of objects. Yet, such descriptions are frequently unavailable. In this article, we introduce a method to measure typicality based on text data. Our approach involves training a deep-learning text classifier based on the BERT language representation and defining the typicality of an object in a concept in terms of the categorization probability produced by the trained classifier. Model training allows for the construction of a feature space adapted to the categorization task and of a mapping between feature combination and typicality that gives more weight to feature dimensions that matter more for categorization. We validate the approach by comparing the BERT-based typicality measure of book descriptions in literary genres with average human typicality ratings. The obtained correlation is higher than 0.85. Comparisons with other typicality measures used in prior research show that our BERT-based measure better reflects human typicality judgments.

Pros received financial support from ERC Consolidator Grant #772268 from the European Commission. G. Le Mens also received financial support from grant PID2019-105249GB-I00/AEI/10.13039/501100011033 from the Spanish Ministerio de Ciencia, Innovacion y Universidades (MCIU) and the Agencia Estatal de Investigacion (AEI) and from the BBVA Foundation Grant G999088Q.

Includes data, material, and analysis code for all analyses.

Country

Spain

Related Organizations

View all View all

Keywords

typicality, transformer models, concepts, Transformer models, categories, deep learning, Deep learning, bert, HM401-1281, Categories, Sociology (General), Concepts, Typicality, BERT

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	14
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%