Skip to content
This repository has been archived by the owner on Feb 12, 2022. It is now read-only.
mujtabachohan edited this page Sep 18, 2013 · 32 revisions

Just started. Work in progress!


### I want to get started. Is there a Phoenix _Hello World_?

Pre-requisite: Download latest Phoenix from here and copy phoenix-*.jar to HBase lib folder and restart HBase.

1. Using console

  1. Start Sqlline: $ sqlline.sh [zookeeper]
  2. Execute the following statements when Sqlline connects:
create table test (mykey integer not null primary key, mycolumn varchar);
upsert into test values (1,'Hello');
upsert into test values (2,'World!');
select * from test;
  1. You should get the following output
+-------+------------+
| MYKEY |  MYCOLUMN  |
+-------+------------+
| 1     | Hello      |
| 2     | World!     |
+-------+------------+

2. Using java

Create test.java file with the following content:

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.PreparedStatement;
import java.sql.Statement;

public class test {

	public static void main(String[] args) throws SQLException {
		Statement stmt = null;
		ResultSet rset = null;
		
		Connection con = DriverManager.getConnection("jdbc:phoenix:zookeeper");
		stmt = con.createStatement();
		
		stmt.executeUpdate("create table test (mykey integer not null primary key, mycolumn varchar)");
		stmt.executeUpdate("upsert into test values (1,'Hello')");
		stmt.executeUpdate("upsert into test values (2,'World!')");
		con.commit();
		
		PreparedStatement statement = con.prepareStatement("select * from test");
		rset = statement.executeQuery();
		while (rset.next()) {
			System.out.println(rset.getString("mycolumn"));
		}
		statement.close();
		con.close();
	}
}

Compile and execute on command line

$ javac test.java

$ java -cp "../phoenix-2.0.0-client.jar:." test

You should get the following output

Hello World!


### Is there a way to bulk load in Phoenix?

Map Reduce

See the example here https://github.com/arunsingh16/Map-Reduce-on-Phoenix-HBase.git Credit: Arun Singh

CSV

CSV data can be bulk loaded with built in utility named psql. Typical upsert rates are 20K - 50K rows per second (depends on how wide are the rows).

Usage example:
Create table using psql $ psql.sh [zookeeper] ../examples/web_stat.sql

Upsert CSV bulk data $ psql.sh [zookeeper] ../examples/web_stat.csv


### How I create Views in Phoenix? What's the difference between Views/Tables?

You can create both a Phoenix table or view through the CREATE TABLE/CREATE VIEW DDL statement on a pre-existing HBase table. In both cases, we'll leave the HBase metadata as-is, except for with a TABLE we turn KEEP_DELETED_CELLS on. For CREATE TABLE, we'll create any metadata (table, column families) that doesn't already exist. We'll also add an empty key value for each row so that queries behave as expected (without requiring all columns to be projected during scans).

The other caveat is that the way the bytes were serialized must match the way the bytes are serialized by Phoenix. For VARCHAR,CHAR, and UNSIGNED_* types, we use the HBase Bytes methods. The CHAR type expects only single-byte characters and the UNSIGNED types expect values greater than or equal to zero.

Our composite row keys are formed by simply concatenating the values together, with a zero byte character used as a separator after a variable length type.

If you create an HBase table like this:

create 't1', {NAME => 'f1', VERSIONS => 5}

then you have an HBase table with a name of 't1' and a column family with a name of 'f1'. Remember, in HBase, you don't model the possible KeyValues or the structure of the row key. This is the information you specify in Phoenix above and beyond the table and column family.

So in Phoenix, you'd create a view like this:

CREATE VIEW "t1" ( pk VARCHAR PRIMARY KEY, "f1".val VARCHAR )

The "pk" column declares that your row key is a VARCHAR (i.e. a string) while the "f1".val column declares that your HBase table will contain KeyValues with a column family and column qualifier of "f1":VAL and that their value will be a VARCHAR.

Note that you don't need the double quotes if you create your HBase table with all caps names (since this is how Phoenix normalizes strings, by upper casing them). For example, with:

create 'T1', {NAME => 'F1', VERSIONS => 5}

you could create this Phoenix view:

CREATE VIEW t1 ( pk VARCHAR PRIMARY KEY, f1.val VARCHAR )

Or if you're creating new HBase tables, just let Phoenix do everything for you like this (No need to use the HBase shell at all.):

CREATE TABLE t1 ( pk VARCHAR PRIMARY KEY, val VARCHAR )


### Are there any tips for optimizing Phoenix?
  • Use Salting to increase read/write performance Salting can significantly increase read/write performance by pre-splitting the data into multiple regions. Although Salting will yield better performance in most scenarios.

Example:

CREATE TABLE TEST (HOST VARCHAR NOT NULL PRIMARY KEY, DESCRIPTION VARCHAR) SALT_BUCKETS=16

Note: Ideally for a 16 region server cluster with quad-core CPUs, choose salt buckets between 32-64 for optimal performance.

  • Per-split table Salting does automatic table splitting but in case you want to exactly control where table split occurs with out adding extra byte or change row key order then you can pre-split a table.

Example:

CREATE TABLE TEST (HOST VARCHAR NOT NULL PRIMARY KEY, DESCRIPTION VARCHAR) SPLIT ON ('CS','EU','NA')

  • Use multiple column families Details to be added

  • Use compression Details to be added

  • Create indexes Details to be added

  • Optimize your cluster Details to be added

  • Optimize Phoenix parameters Details to be added


### How do I create Index on a column?

Phoenix 2.0.x supports Index over immutable data.

Example

  • Create table with immutable rows

create table test (mykey varchar primary key, col1 varchar, col2 varchar) immutable_rows=true;

  • Creating index on col2

create index idx on test (col2)

  • Creating index on col1 and a covered index on col2

create index idx on test (col1) include (col2)

Upsert rows in this test table and Phoenix query optimizer will choose correct index to use. You can see in explain plan if Phoenix is using the index table. You can also give a hint in Phoenix query to use a specific index.

Note: Upcoming Phoenix release will also support Indexing mutable data.


### How fast is Phoenix? Why is it so fast?

Phoenix is fast. Full table scan of 100M rows usually completes in 20 seconds (narrow table on a medium sized cluster). This time come down to few milliseconds if query contains filter on key columns. For filters on non-key columns or non-leading key columns, you can add index on these columns which leads to performance equivalent to filtering on key column by making copy of table with indexed column(s) part of key.

Why is Phoenix fast even when doing full scan:

  1. Phoenix chunks up your query using the region boundaries and runs them in parallel on the client using a configurable number of threads
  2. The aggregation will be done in a coprocessor on the server-side, collapsing the amount of data that gets returned back to the client rather than returning it all.

### How do I connect to secure HBase cluster? Check out excellent post by Anil Gupta http://bigdatanoob.blogspot.com/2013/09/connect-phoenix-to-secure-hbase-cluster.html
### How do I connect with HBase running on Hadoop-2? Hadoop-2 profile exists in Phoenix pom.xml. See related post https://github.com/forcedotcom/phoenix/issues/286
Clone this wiki locally