Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Steps to implement hindex to my hbase cluster #50

Open
kunkumar opened this issue Sep 11, 2014 · 6 comments
Open

Steps to implement hindex to my hbase cluster #50

kunkumar opened this issue Sep 11, 2014 · 6 comments

Comments

@kunkumar
Copy link

I was able to build the project and run the map reduce bulk insert and load incremental file.

hbase org.apache.hadoop.hbase.index.mapreduce.IndexImportTsv
hbase org.apache.hadoop.hbase.index.mapreduce.IndexLoadIncrementalHFile

But, something strange is happening, after the compeletion of process for just 6GB of data, the size of the hbase table keeps and keeps on increasing till 200 gb after which I had to shutdown the cluster.

Please suggest whats going wrong here ?

Thanks

@hy2014
Copy link

hy2014 commented Sep 15, 2014

may be you rowkey is too long, i think.

@kunkumar
Copy link
Author

I have created a hase table and index table with hindex framework, but when we are uploading more data into same table, it keeps on increasing the size of index table only and no actual data is appearing in Hbase table. In this case my input data is 80 GB and the index table has grown to 200+ GB and no new data appearing in the main table.

Can rowkey size be a reason for such huge table size ?

@hy2014
Copy link

hy2014 commented Sep 15, 2014

index table rowkey contains the index column/value and user table rowkey. As you said, your user table data size has no change, so your index table affect data size.

@SilentMing
Copy link

Is there any detail description in how to implement hindex in an existing Cluster?

@abhi-kr
Copy link

abhi-kr commented Sep 8, 2015

For existing cluster, if you already have all required hbase-secondary index related configurations configured in your cluster machines(HMaster+Regionservers, else after making all configuration changes restart tour cluster) then you can make use of class "org.apache.hadoop.hbase.index.mapreduce.TableIndexer" to create index on existing user tables:

./hbase org.apache.hadoop.hbase.index.mapreduce.TableIndexer -Dtablename.to.index=<table_name> -Dtable.columns.index='IDX1=>cf1:[q1->datatype&length];cf2:[q1->datatype&length],[q2->datatype&length],[q3->datatype& lenght]#IDX2=>cf1:q5,q5'

Here,
tablename.to.index: Table name to create index.
table.columns.index : Table columns on which index to be created.

The format used here is:
IDX1 - Name of the Index given by user
cf1 - Column family name of user table
q1 - qualifier name
datatype - datatype of column values "cf1:q1"
[Int, String, Double, Float]
length - Maximum length of the values of "cf1:q1"
# is used to separate between two index details

@SilentMing
Copy link

Thanks for your kind answer, abhi-kr. I did it successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants