Certain points to define HBase :
- Column oriented database management system, which runs on top of HDFS
- Not a relational data store and does not support query language like SQL – this is called NoSQL database
- Supports a very flexible data model and thus, it is really good for data that cannot be saved in traditional RDBMS with strict structure.
- Sparse – this means data can be scattered. If we do not have a value for any given column, it will not exist.
- It consists of a master node that regulates the cluster and region servers to store portions of table and operate on data.
- Sharding – HBase is a distributed data store where data is distributed via regions. Once the region outgrows a particular size it gets re-splitted automatically. Thus, supporting automatic sharding.
When we should choose HBase :
- We have huge amount of data. We would get performance benefit only when data is scattered among multiple nodes. For a single node data, perhaps, RDBMS would be a better choice.
- We don’t need typical RDBMS features like typed columns, typical data model, advanced query language etc.
- We have a good hadoop cluster with number of datanodes.
** Since I am also learning HBase currently, the information given here may not be 100% accurate. If so, please let me know and I’ll modify the content.