-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix uns merge 3d #1302
Fix uns merge 3d #1302
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1302 +/- ##
=========================================
+ Coverage 0 85.70% +85.70%
=========================================
Files 0 34 +34
Lines 0 5450 +5450
=========================================
+ Hits 0 4671 +4671
- Misses 0 779 +779
Flags with carried forward coverage won't be shown. Click here to find out more.
|
|
||
|
||
def gen_3d_recarray(_): | ||
# Ignoring n as it can get quite slow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean exactly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which part?
recarray
refers to a record array: https://numpy.org/doc/stable/reference/generated/numpy.recarray.html#numpy-recarray
n is a parameter passed to all the other functions here. Generally it changes the size of the result, but we ignore that in a couple cases. Here I ignore it because it made some test cases take over a minute.
if a.shape != b.shape: | ||
return False | ||
|
||
return equal(pd.DataFrame(a.reshape(-1)), pd.DataFrame(b.reshape(-1))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does _.reshape(-1)
mean? The numpy docs say
One shape dimension can be -1. In this case, the value is inferred from the length of the array and remaining dimensions.
So what you do basically creates 1D pd.DataFrames
? Shouldn’t we make and compare pd.Series
then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would make sense, but in practice the Series constructor acts differently and throws errors for some of the non-numeric dtypes.
Co-authored-by: Isaac Virshup <[email protected]>
This was a little harder than expected due to our need to handle non-numeric dtypes. We were using dataframe comparisons to handle that.
Now we are using numpy when able, but switching to dataframes constructed from reshaped arrays when we can't.