
HBase is a distributed column-oriented database built on top of HDFS. HBase is the Hadoop application to use when you require real-time read/write random access to very large datasets. HBase is a scalable data store targeted at random read and write access of (fairly-) structured data. It's modeled after Google's Big table and targeted to support large tables, on the order of billions of rows and millions of columns. It uses HDFS as the underlying file system and is designed to be fully distributed and highly available. Version 0.20 introduces significant performance improvement. Base's Table Input Format is designed to allow a Map Reduce program to operate on data stored in an HBase table. Table Output Format is for writing Map Reduce outputs into an HBase table. HBase has different storage characteristics than HDFS, such as the ability to do row updates and column indexing, so we can expect to see these features used by Hive in future releases. It is already possible to access HBase tables from Hive. This paper includes the step by step introduction to the HBase, Identify differences between apache HBase and a traditional RDBMS, The Problem with Relational Database Systems, Relation between the Hadoop and HBase, How an Apache HBase table is physically stored on disk. Later part of this paper introduces Map Reduce, HBase table and how Apache HBase Cells stores data, what happens to data when it is deleted. Last part explains difference between Big Data and HBase, Conclusion followed with the References.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 11 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
