
The JobAds project aims to investigate the transformation of the labor market by analyzing historical job advertisements from digitized newspapers from 1850-1950. We use Natural Language Processing (NLP) techniques to extract and analyze various aspects of job ads, such as positions, skills, or media strategies. One challenge is the need for high-quality machine-readable textual data, which is hindered by errors in Optical Layout Recognition (OLR) and Optical Character Recognition (OCR). The impact of OCR quality on NLP tasks will be evaluated to determine the necessity of post-corrections. Language variations, such as spelling and abbreviations, pose additional challenges. The project also considers potential bias from external factors and digitized newspaper processing. In the first phase, we aim to improve the OLR and OCR results, afterwards we plan to examine job ad structure, analyze job descriptions, and identify trends in employment dynamics and demand for specific occupations and skills over time.
Paper, Kontextsetzung, Strukturanalyse, Historical newspapers, Optical Character Recognition (OCR), Natural Language Processing (NLP), Posterpräsentation, labour market, Text, job advertisements, Bearbeitung, DHd2024, Bereinigung, Inhaltsanalyse
Paper, Kontextsetzung, Strukturanalyse, Historical newspapers, Optical Character Recognition (OCR), Natural Language Processing (NLP), Posterpräsentation, labour market, Text, job advertisements, Bearbeitung, DHd2024, Bereinigung, Inhaltsanalyse
