
This dataset contains a rigorously curated collection of 2,847 job postings related to Artificial Intelligence (AI) and Machine Learning (ML) roles, sourced from JobStreet and LinkedIn in Indonesia. Through a systematic 7-stage filtering pipeline, the dataset was refined to 51 high-quality, analysis-ready job postings with complete skill section information. The dataset is designed to support research in computational workforce analysis, technical skills gap assessment, natural language processing (NLP) of job descriptions, and labor market informatics specific to the rapidly evolving AI/ML sector in Southeast Asia. Dataset Structure The dataset is organized into 9 Excel sheets: Table Copy Sheet Description Records Pipeline Summary Overview of filtering stages and retention rates 7 rows All Jobs Complete dataset with all 2,847 postings and metadata 2,847 rows Stage 1 - Raw Collection Initial unfiltered collection from both platforms 2,847 rows Stage 2 - Deduplication After removing duplicate postings 2,694 rows Stage 3 - IT Relevance Filtered to IT/tech job postings only 1,347 rows Stage 4 - AI Relevance Filtered to explicit AI/ML skill mentions 248 rows Stage 5 - Content Quality Posts with sufficient description length 82 rows Stage 6 - Language Processable language (English) 75 rows Stage 7 - Final Sample Complete skill section - analysis-ready 51 rows
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
