Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cur.getSchema() when <date> fields exist throws exception (when field is NULL)? #35

Open
BAM-BAM-BAM opened this issue Jan 23, 2015 · 2 comments

Comments

@BAM-BAM-BAM
Copy link

I've been using pyhs2 with a Hortonworks cluster.

I have a simple table with a type "date" field, when I call the cursor::getSchema() function an exception is thrown (KeyError 17). I think this only happens when a date field value is NULL.

Here is the table:
hive -e 'describe extended test_date'
Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
OK
col_name data_type comment
mydate date
_c1 bigint

Detailed Table Information Table(tableName:test_date, dbName:default, owner:jprior, createTime:1422053397, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:mydate, type:date, comment:null), FieldSchema(name:c1, type:bigint, comment:null)], location:hdfs://__.com:8020/apps/hive/warehouse/test_date, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{numFiles=5, COLUMN_STATS_ACCURATE=true, transient_lastDdlTime=1422053397, numRows=386, totalSize=6910, rawDataSize=6524}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
Time taken: 2.275 seconds, Fetched: 4 row(s)

Here is how to generate the exception:

ipython

In [1]: import pyhs2
In [2]: cnx = pyhs2.connect(eval("{'host':'', 'port':10000, 'authMechanism':'**', 'database':'default', 'user':'', 'password':'', }"))
In [3]: cur = cnx.cursor()
In [4]: query = 'select mydate from test_date'
In [5]: cur.execute(query)
In [6]: rows = cur.fetch()
In [7]: rows[:10]
Out[7]:
[[None],
['2013-12-31'],
['2014-01-05'],
['2014-01-10'],
['2014-01-15'],
['2014-01-20'],
['2014-01-25'],
['2014-01-30'],
['2014-02-04'],
['2014-02-09']]

In [8]: column_names = [a['columnName'] for a in cur.getSchema()]

KeyError Traceback (most recent call last)
in ()
----> 1 column_names = [a['columnName'] for a in cur.getSchema()]

/edge/1/anaconda/lib/python2.7/site-packages/pyhs2/cursor.pyc in getSchema(self)
196 for c in self.client.GetResultSetMetadata(req).schema.columns:
197 col = {}
--> 198 col['type'] = get_type(c.typeDesc)
199 col['columnName'] = c.columnName
200 col['comment'] = c.comment

/edge/1/anaconda/lib/python2.7/site-packages/pyhs2/cursor.pyc in get_type(typeDesc)
10 for ttype in typeDesc.types:
11 if ttype.primitiveEntry is not None:
---> 12 return TTypeId._VALUES_TO_NAMES[ttype.primitiveEntry.type]
13 elif ttype.mapEntry is not None:
14 return ttype.mapEntry

KeyError: 17

@BradRuderman
Copy link
Owner

what version of hive?

@BAM-BAM-BAM
Copy link
Author

Hive 0.13.0.2.1.4.0-632

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants