Steps to implement hindex to my hbase cluster #50

kunkumar · 2014-09-11T13:53:27Z

I was able to build the project and run the map reduce bulk insert and load incremental file.

hbase org.apache.hadoop.hbase.index.mapreduce.IndexImportTsv
hbase org.apache.hadoop.hbase.index.mapreduce.IndexLoadIncrementalHFile

But, something strange is happening, after the compeletion of process for just 6GB of data, the size of the hbase table keeps and keeps on increasing till 200 gb after which I had to shutdown the cluster.

Please suggest whats going wrong here ?

Thanks

hy2014 · 2014-09-15T07:27:50Z

may be you rowkey is too long, i think.

kunkumar · 2014-09-15T08:22:27Z

I have created a hase table and index table with hindex framework, but when we are uploading more data into same table, it keeps on increasing the size of index table only and no actual data is appearing in Hbase table. In this case my input data is 80 GB and the index table has grown to 200+ GB and no new data appearing in the main table.

Can rowkey size be a reason for such huge table size ?

hy2014 · 2014-09-15T08:42:47Z

index table rowkey contains the index column/value and user table rowkey. As you said, your user table data size has no change, so your index table affect data size.

SilentMing · 2015-09-08T06:15:07Z

Is there any detail description in how to implement hindex in an existing Cluster？

abhi-kr · 2015-09-08T10:20:39Z

For existing cluster, if you already have all required hbase-secondary index related configurations configured in your cluster machines(HMaster+Regionservers, else after making all configuration changes restart tour cluster) then you can make use of class "org.apache.hadoop.hbase.index.mapreduce.TableIndexer" to create index on existing user tables:

./hbase org.apache.hadoop.hbase.index.mapreduce.TableIndexer -Dtablename.to.index=<table_name> -Dtable.columns.index='IDX1=>cf1:[q1->datatype&length];cf2:[q1->datatype&length],[q2->datatype&length],[q3->datatype& lenght]#IDX2=>cf1:q5,q5'

Here,
tablename.to.index: Table name to create index.
table.columns.index : Table columns on which index to be created.

The format used here is:
IDX1 - Name of the Index given by user
cf1 - Column family name of user table
q1 - qualifier name
datatype - datatype of column values "cf1:q1"
[Int, String, Double, Float]
length - Maximum length of the values of "cf1:q1"
# is used to separate between two index details

SilentMing · 2015-09-09T01:07:20Z

Thanks for your kind answer, abhi-kr. I did it successfully.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Steps to implement hindex to my hbase cluster #50

Steps to implement hindex to my hbase cluster #50

kunkumar commented Sep 11, 2014

hy2014 commented Sep 15, 2014

kunkumar commented Sep 15, 2014

hy2014 commented Sep 15, 2014

SilentMing commented Sep 8, 2015

abhi-kr commented Sep 8, 2015

SilentMing commented Sep 9, 2015

Steps to implement hindex to my hbase cluster #50

Steps to implement hindex to my hbase cluster #50

Comments

kunkumar commented Sep 11, 2014

hy2014 commented Sep 15, 2014

kunkumar commented Sep 15, 2014

hy2014 commented Sep 15, 2014

SilentMing commented Sep 8, 2015

abhi-kr commented Sep 8, 2015

SilentMing commented Sep 9, 2015