Powered by OpenAIRE graph
Found an issue? Give us feedback
ZENODOarrow_drop_down
ZENODO
Dataset . 2026
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2026
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

A Curated Dataset for Saturation-Guided Method for Operationalizing Data Sufficiency in AI Skill Analysis from Job Advertisements

A Curated Dataset of AI and Machine Learning Job Postings from Indonesia Job Platform
Authors: Handayani, Tri Pratiwi; Idris, Norisma; Shuib, Liyana;

A Curated Dataset for Saturation-Guided Method for Operationalizing Data Sufficiency in AI Skill Analysis from Job Advertisements

Abstract

This dataset contains a rigorously curated collection of 2,847 job postings related to Artificial Intelligence (AI) and Machine Learning (ML) roles, sourced from JobStreet and LinkedIn in Indonesia. Through a systematic 7-stage filtering pipeline, the dataset was refined to 51 high-quality, analysis-ready job postings with complete skill section information. The dataset is designed to support research in computational workforce analysis, technical skills gap assessment, natural language processing (NLP) of job descriptions, and labor market informatics specific to the rapidly evolving AI/ML sector in Southeast Asia. Dataset Structure The dataset is organized into 9 Excel sheets: Table Copy Sheet Description Records Pipeline Summary Overview of filtering stages and retention rates 7 rows All Jobs Complete dataset with all 2,847 postings and metadata 2,847 rows Stage 1 - Raw Collection Initial unfiltered collection from both platforms 2,847 rows Stage 2 - Deduplication After removing duplicate postings 2,694 rows Stage 3 - IT Relevance Filtered to IT/tech job postings only 1,347 rows Stage 4 - AI Relevance Filtered to explicit AI/ML skill mentions 248 rows Stage 5 - Content Quality Posts with sufficient description length 82 rows Stage 6 - Language Processable language (English) 75 rows Stage 7 - Final Sample Complete skill section - analysis-ready 51 rows

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Related to Research communities