-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Steps to implement hindex to my hbase cluster #50
Comments
may be you rowkey is too long, i think. |
I have created a hase table and index table with hindex framework, but when we are uploading more data into same table, it keeps on increasing the size of index table only and no actual data is appearing in Hbase table. In this case my input data is 80 GB and the index table has grown to 200+ GB and no new data appearing in the main table. Can rowkey size be a reason for such huge table size ? |
index table rowkey contains the index column/value and user table rowkey. As you said, your user table data size has no change, so your index table affect data size. |
Is there any detail description in how to implement hindex in an existing Cluster? |
For existing cluster, if you already have all required hbase-secondary index related configurations configured in your cluster machines(HMaster+Regionservers, else after making all configuration changes restart tour cluster) then you can make use of class "org.apache.hadoop.hbase.index.mapreduce.TableIndexer" to create index on existing user tables: ./hbase org.apache.hadoop.hbase.index.mapreduce.TableIndexer -Dtablename.to.index=<table_name> -Dtable.columns.index='IDX1=>cf1:[q1->datatype&length];cf2:[q1->datatype&length],[q2->datatype&length],[q3->datatype& lenght]#IDX2=>cf1:q5,q5' Here, The format used here is: |
Thanks for your kind answer, abhi-kr. I did it successfully. |
I was able to build the project and run the map reduce bulk insert and load incremental file.
hbase org.apache.hadoop.hbase.index.mapreduce.IndexImportTsv
hbase org.apache.hadoop.hbase.index.mapreduce.IndexLoadIncrementalHFile
But, something strange is happening, after the compeletion of process for just 6GB of data, the size of the hbase table keeps and keeps on increasing till 200 gb after which I had to shutdown the cluster.
Please suggest whats going wrong here ?
Thanks
The text was updated successfully, but these errors were encountered: