
Abstract With the advent of Web technology, the Web is full of unstructured data called Big Data. However, these data are not easy to collect, access, and process at large scale. Web Crawling is an optimization problem. Site-specific crawling of various social media platforms, e-Commerce websites, Blogs, News websites, and Forums is a requirement for various business organizations to answer a search quarry from webpages. Indexing of huge number of webpage requires a cluster with several petabytes of usable disk. Since the NoSQL databases are highly scalable, use of NoSQL database for storing the Crawler data is increasing along with the growing popularity of NoSQL databases. This chapter discusses about the application of NoSQL database in Web Crawler application to store the data collected by the Web Crawler.
| citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 3 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
