Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cursor _fetch raises NoneType AttributeError on empty response #57

Open
Kirn-Opower opened this issue Dec 4, 2015 · 4 comments
Open

Comments

@Kirn-Opower
Copy link

I am upgrading from Hive 0.10 in CDH4 to Hive 0.14 in CDH5. In this version, the attribute TFetchResultsResp.results can be None for queries that return an empty response (such as create database or create table statements). Consequently, cursor._fetch raises an error in Line 222 on the call to get the rows in the results attribute, which is None. This is trivially fixed with the following implementation of _fetch in cursor:

    def _fetch(self, rows, fetchReq):
        resultsRes = self.client.FetchResults(fetchReq)
        if resultsRes.results is None or len(resultsRes.results.rows) == 0:
            self.hasMoreRows = False
            return rows
        for row in resultsRes.results.rows:
            rowData= []
            for i, col in enumerate(row.colVals):
                rowData.append(get_value(col))
            rows.append(rowData)
        return rows

However, since this project is no longer maintained and it appears PRs are not reviewed, I will not be submitting a PR with the fix. I am simply filing the issue for reference. I'll (hackily) handle by catching the error and returning an empty set, though this is naturally a very poor solution to the problem.

@BradRuderman
Copy link
Owner

Hi @Kirn-Opower ,

Yes i apologize, I don't use hadoop or hive anymore so I don't have the resources to maintain this project. Happy to transition back to community.

Thanks,
Brad

@Kirn-Opower
Copy link
Author

Thanks @BradRuderman, completely understood, if you're not using it yourself, it's hard to justify maintaining it. We're investigating PyHive from Dropbox as an alternative.

@kkennedy314
Copy link

We have also transitioned from using Hadoop/Hive to Hadoop/Spark, so we are not using Hive directly as I had thought we would.

@Kirn-Opower
Copy link
Author

We're also moving to Spark, but we're currently using it in standalone mode (not colocated with the Hadoop/Hive cluster). Consequently, we pre-serialize our filtered job inputs to Hive temp tables using pyhs2, then pull the (smaller) temp table into the Spark cluster. This prevents us from pulling the entire Hive database into the Spark standalone cluster every time, just the inputs we need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants