
Technical A reproducible, script-driven pipeline is specified for mining the MediaWiki corpus at heimskringla.no for attestations belonging to the curated óðal/aþal lexical complex. The workflow enforces a three-stage separation—(i) URL enumeration, (ii) per-page acquisition, and (iii) extraction plus matching—to isolate coverage decisions from network volatility and to preserve auditability. Corpus-wide coverage is obtained via the MediaWiki Action API using action=query&list=allpages with full continuation handling (apcontinue) and optional redirect exclusion (apfilterredir=nonredirects); a bounded category-harvesting mode is also supported. Each enumerated page is fetched once and persisted as a raw HTML snapshot with accompanying metadata (requested/resolved URL, timestamps, HTTP status, and captured revision identifiers). Text extraction is MediaWiki-aware, preferring the main content container and excluding predictable UI/editorial scaffolding; reference/notes strata can be separated and are excluded by default. Mining is performed against the derived clean-text layer using an invariant philological core (athal_core), while Heimskringla-specific adaptations are confined to span-safe keying normalization to reduce false negatives without rewriting evidential spans. Outputs include an append-only TSV concordance with KWIC context and stable character offsets, per-page text hashes for drift detection, and JSONL run manifests enabling resumable execution and revision-stable replay via captured oldid permalinks. Non-technical A practical method is presented for searching the Heimskringla website—an online library built on wiki software—for a specific family of Old Norse words related to inherited land and lineage (óðal/aþal). The approach is designed to be repeatable and trustworthy: first, it makes a complete list of the pages to examine; second, it saves an exact copy of each page as it was retrieved; third, it strips away menus, categories, and other website “scaffolding” so that only the real text is searched. The actual word-search logic is kept stable and unchanged, so results from different runs or different corpora remain comparable. Every finding is recorded with surrounding context and with enough provenance information to trace it back to the exact page version used, even if the website later changes. The end product is a transparent concordance—essentially a searchable evidence table—that supports philological analysis without relying on manual browsing or unreliable site-wide search boxes.
computational corpus linguistics, Digital philology, Old Norse lexical semantics, Germanic legal vocabulary, medieval Scandinavian law texts
computational corpus linguistics, Digital philology, Old Norse lexical semantics, Germanic legal vocabulary, medieval Scandinavian law texts
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
