
handle: 11250/3157981
This thesis describes a project which aims to develop a geoparser for Norwegian language text. A geoparser is a tool that reads a piece of text, extracts any potential location mentions, and then resolves these location mentions to their real-world toponyms. At the time this thesis was written, there were no known geoparsers available that specialize exclusively on Norwegian text. The solution produced here is therefore unique in this sense. The task of geoparsing is non-trivial, as there are often many geographical locations that share the same name. The geoparser must therefore be able to disambiguate a location men- tion, using whatever clues it has available to it. In this project, the geoparser will try to infer geographical regions of relevance, and also try to identify potential geographical hierarchies between the different location mentions in the text. Furthermore, it is also based on common geoparsing heuristics, such as population size being a strong indicator of toponym impor- tance. To find potential candidates for a location mention, it uses GeoNames, a geographical gazetteer containing entries for more than 11 million toponyms from all over the world. It also uses Stedsnavn, a Norwegian dataset containing over 1 million Norwegian toponym entries. Basic testing is done to check the viability of the solution, but evaluating the geoparser in general is tough, as there are no proper datasets with which to test it.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
