INCA: Infrastructure for Content Analysis

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 01 Oct 2018 Netherlands Publisher:IEEEJournal:2018 IEEE 14th International Conference on e-Science (e-Science)

Authors: Damian Trilling; Bob van de Velde; Anne C. Kroon; Felicia Löcherbach; Theo B. Araujo; Joanna Strycharz; Tamara Raats; +2 Authors

doi: 10.1109/escience.2018.00078

handle: 11245.1/c2a50273-98f7-4a34-a8e1-2ed3e1ce0d81

INCA: Infrastructure for Content Analysis

- Summary
- Subjects
- Metrics

Abstract

We present INCA (short for INfrastructure for Content Analysis), a Python module for collecting, storing, processing, and analyzing a wide variety of media content, including but not limited to news, political debates, social media, forums, and customer reviews. Using Elasticsearch as a database backend and Celery for task management, it makes automated content analysis scalable. INCA's main objective is to enable and promote an integrated workflow. INCA focuses on re-usability of data, processors, and analyses; making all steps of automated content analysis (ACA) accessible to social scientists, without requiring advanced programming skills. Here, we present the aim, implementation and recommended workflow for INCA.

Country

Netherlands

Related Organizations

University of Amsterdam
Netherlands

Keywords

000, 004

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	5
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%