
doi: 10.1007/11837787_2
In both conceptions, the common factor (the web) imposes certain requirements: extremely variable scalability (from a home page to community sites to sites that encompass a significant fraction of the web), rapid evolution, radical distribution, arbitrary interconnection and aggregation, and very little validation or other means of control. The demands of the web are forcing both the knowledge representation (KR) and the database communities to stretch their understanding and technology in different ways. While implementation techniques require revamping to deal with web scale, finding the right level and sort of expressiveness is even more critical. The web doesn’t just need bigger databases, it needs “better” ones. The rise of semi-structured data, especially in the form of XML and associated languages, is driven by the success of HTML as a data representation language as well as its many failures. The amount of data that has been created or converted to HTML is staggering. HTML allows novices to publish all sorts of information quite easily while also supporting complex information structures (for example, see the typical site map of a large site). However, HTML is lacking in a number of ways, especially in the management, evolution, integration, and repurposing of data. HTML, especially in common use, has (at least) three fundamental problems: malformed or misused constructs, a heavy presentation orientation, and a lack of needed expressivity. These problems stem from aspects of HTML (and associated software like the browser) that, we believe, contributed to its success. Browsers were very permissive in their parsing and rendering of HTML, which lowered the barrier to producing pages. Various presentation features in HTML made it an attractive platform for publishing information from software manuals to dictionaries to newspapers with ads. HTML’s core simplicity requires a lack of expressivity, which makes it easier to learn (and to learn to “abuse”). More significantly, by pushing the balance of expressivity (and thus complexity) toward the presentation aspects of the language, it was relatively neutral toward content of different sorts. Consider the effect of requiring a specialized content language to be developed before one could publish, say, a recipe. Either the user would have to develop their own
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 2 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
