
handle: 11585/243678
As a result of the European Union’s pressure towards internationalization, universities in many countries find themselves increasingly urged to provide information on their requirements and services and to promote themselves in English on the web. Hence the need for corpus resources and studies of institutional academic English used as an international language (or lingua franca) on the web. This paper introduces “acWaC-EU” (an acronym for “academic Web-as-Corpus in Europe”), a corpus of web pages in English crawled from the websites of European universities and annotated with contextual metadata. The corpus contains approximately 40 million words from native English universities and a similar number of words from universities based in all other European countries, in which English is used as a lingua franca. Thanks to the metadata, it is possible to re-group texts for comparison based, e.g., on the language family of the native language spoken in the country where the text was produced. The paper describes and evaluates the corpus construction pipeline and the corpus itself, presents a case study on the use of modal and semi-modal verbs in lingua franca vs. native texts, and looks at future developments, in particular as concerns simple heuristics for topic-/genre-oriented subcorpus construction.
ENGLISH AS A LINGUA FRANCA; INSTITUTIONAL ACADEMIC LANGUAGE; WEB AS CORPUS
ENGLISH AS A LINGUA FRANCA; INSTITUTIONAL ACADEMIC LANGUAGE; WEB AS CORPUS
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
