cur.getSchema() when <date> fields exist throws exception (when field is NULL)? #35

BAM-BAM-BAM · 2015-01-23T23:40:51Z

I've been using pyhs2 with a Hortonworks cluster.

I have a simple table with a type "date" field, when I call the cursor::getSchema() function an exception is thrown (KeyError 17). I think this only happens when a date field value is NULL.

Here is the table:
hive -e 'describe extended test_date'
Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
OK
col_name data_type comment
mydate date
_c1 bigint

Detailed Table Information Table(tableName:test_date, dbName:default, owner:jprior, createTime:1422053397, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:mydate, type:date, comment:null), FieldSchema(name:c1, type:bigint, comment:null)], location:hdfs://__.com:8020/apps/hive/warehouse/test_date, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{numFiles=5, COLUMN_STATS_ACCURATE=true, transient_lastDdlTime=1422053397, numRows=386, totalSize=6910, rawDataSize=6524}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
Time taken: 2.275 seconds, Fetched: 4 row(s)

Here is how to generate the exception:

ipython

In [1]: import pyhs2
In [2]: cnx = pyhs2.connect(eval("{'host':'', 'port':10000, 'authMechanism':'**', 'database':'default', 'user':'', 'password':'', }"))
In [3]: cur = cnx.cursor()
In [4]: query = 'select mydate from test_date'
In [5]: cur.execute(query)
In [6]: rows = cur.fetch()
In [7]: rows[:10]
Out[7]:
[[None],
['2013-12-31'],
['2014-01-05'],
['2014-01-10'],
['2014-01-15'],
['2014-01-20'],
['2014-01-25'],
['2014-01-30'],
['2014-02-04'],
['2014-02-09']]

In [8]: column_names = [a['columnName'] for a in cur.getSchema()]

KeyError Traceback (most recent call last)
in ()
----> 1 column_names = [a['columnName'] for a in cur.getSchema()]

/edge/1/anaconda/lib/python2.7/site-packages/pyhs2/cursor.pyc in getSchema(self)
196 for c in self.client.GetResultSetMetadata(req).schema.columns:
197 col = {}
--> 198 col['type'] = get_type(c.typeDesc)
199 col['columnName'] = c.columnName
200 col['comment'] = c.comment

/edge/1/anaconda/lib/python2.7/site-packages/pyhs2/cursor.pyc in get_type(typeDesc)
10 for ttype in typeDesc.types:
11 if ttype.primitiveEntry is not None:
---> 12 return TTypeId._VALUES_TO_NAMES[ttype.primitiveEntry.type]
13 elif ttype.mapEntry is not None:
14 return ttype.mapEntry

KeyError: 17

BradRuderman · 2015-01-26T19:06:05Z

what version of hive?

BAM-BAM-BAM · 2015-01-26T19:12:38Z

Hive 0.13.0.2.1.4.0-632

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cur.getSchema() when <date> fields exist throws exception (when field is NULL)? #35

cur.getSchema() when <date> fields exist throws exception (when field is NULL)? #35

BAM-BAM-BAM commented Jan 23, 2015

BradRuderman commented Jan 26, 2015

BAM-BAM-BAM commented Jan 26, 2015

cur.getSchema() when <date> fields exist throws exception (when field is NULL)? #35

cur.getSchema() when <date> fields exist throws exception (when field is NULL)? #35

Comments

BAM-BAM-BAM commented Jan 23, 2015

In [8]: column_names = [a['columnName'] for a in cur.getSchema()]

BradRuderman commented Jan 26, 2015

BAM-BAM-BAM commented Jan 26, 2015