Cloudera VP Customer Solutions, Omer Trajman, presented a talk on HBase Dos and Don’ts to the Los Angeles Hadoop Users Group earlier this month.
The full video is available after the jump.
LA-HUG HBASE DO’s and DON’Ts from Shopzilla on Vimeo.
At glance, the dos and don’ts are:
Do:
- Use a key prefix that distributes well
- Keep the number of regions reasonable – about 100 per node.
- Disable auto-compaction
- Use compression
- Explicitly put hbase-site.xml in your CLASSPATH
- Monitor the health of your cluster
- Store multiple copies for different access patterns
Don’t:
- Wholesale replacement of every RDBMS
- Run huge MR Jobs directly off of HBase
- Use timestamps as teh first part of your key
- Allocate all CPUs to your TaskTrackers
- Mixed workloads with SLAs
- Use a single client bulk load or bulk load with put
- Let the Region Server Swap