
Data and code deposit accompanying the scoping review “Edge language models and agentic AI in building automation: a scoping review of evidence maturity and authority tiers” (Simson, Kiil, Võsa & Kurnitski, 2026). Contents: Full deduplicated corpus with screening decisions (S1, 1,281 records). In-window themed corpus (S2, 272 rows) with A1–A10 theme tags and rule-based E1–E4 / T1–T4 heuristic columns. S2 retains the full audit trail: 266 final_status = INCLUDE, 2 BACKGROUND, and 4 document-type EXCLUDE rows; after post-coding scope review these correspond to 257 INCLUDE, 5 INCLUDE_NON_LM contextual comparators, 4 OUT_OF_SCOPE_RECLASSIFIED, 2 BACKGROUND, and 4 EXCLUDE rows. Query log: 27 Boolean queries, final search 30 May 2026, plus 12 pre-2022 backward-citation records. Per-record manual E-tier and T-tier coding sheet for all 266 records that passed the screening pipeline before post-coding scope review (S3). Filtering S3 by scope_status = INCLUDE gives the 257 LM/agentic-AI primary records used for Table 2 of the main text; filtering by scope_status in {INCLUDE, INCLUDE_NON_LM} gives the 262 in-scope primary records. Cross-study quantitative extraction (S4, 23 named studies). Dual-screener audit on a 60-record stratified sample (Cohen’s κ = 0.733–0.895 across screener/pipeline comparisons), a 50-record AUTO_EXCLUDE recall check with 0 unambiguous false negatives, and a 58-record second-coder E×T reliability audit (κE = 0.786, κT = 0.715). Targeted preprint/conference scan covering ACM BuildSys, ACM e-Energy, arXiv, IEEE smart-building / smart-grid proceedings, and 20 LLM-BACS candidates. Consolidated Python screening pipeline and seven-stage abstract-retrieval scripts: Crossref → OpenAlex → Semantic Scholar → Springer Nature Meta → Elsevier ScienceDirect → Elsevier Scopus → Google Scholar via scholarly. E1–E4 / T1–T4 coding rubric with 12 borderline-case rules. PRISMA-ScR checklist and extended narrative supplements, including the cybersecurity threat model, ten-gap research roadmap, limitations section, and per-application extended synthesis. Supplementary/source figures S1 and S2: Sankey theme × year diagram and journal × year heatmap, provided as PNG and SVG. Licence: CC-BY-4.0 for data and documents; MIT for code. See LICENSE-CC-BY-4.0.txt and LICENSE-MIT.txt in the deposit. Publisher API access details are provided only as credential placeholders and configuration notes. Use of the Springer Nature and Elsevier APIs remains subject to the respective publisher API terms of service and is not covered by the MIT licence.
