The full video is available after the jump.
At glance, the dos and don'ts are:
- Use a key prefix that distributes well
- Keep the number of regions reasonable - about 100 per node.
- Disable auto-compaction
- Use compression
- Explicitly put hbase-site.xml in your CLASSPATH
- Monitor the health of your cluster
- Store multiple copies for different access patterns
- Wholesale replacement of every RDBMS
- Run huge MR Jobs directly off of HBase
- Use timestamps as teh first part of your key
- Allocate all CPUs to your TaskTrackers
- Mixed workloads with SLAs
- Use a single client bulk load or bulk load with put
- Let the Region Server Swap