Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ludwig feature dict #3904

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

dennisrall
Copy link
Contributor

Some minor improvements to the LudwigFeatureDict:

  • remove strange __next__ method
  • remove internal_key_to_original_name_map dict as the values were never read
  • use MutableMapping from the collections.abc → get useful and more performant methods
  • added tests for the new functionality and split up the old test (I kept the old test to show it is still running, but I think it can be removed?)

Copy link

github-actions bot commented Jan 21, 2024

Unit Test Results

  4 files   -   2    4 suites   - 2   2m 58s ⏱️ - 11m 42s
12 tests ±  0    6 ✔️  -   3    5 💤 +2  1 +1 
40 runs   - 20  16 ✔️  - 26  20 💤 +2  4 +4 

For more details on these failures, see this check.

Results for commit acfa198. ± Comparison against base commit 0a24d0a.

This pull request skips 2 tests.
tests.regression_tests.benchmark.test_model_performance ‑ test_performance[ames_housing.gbm.yaml]
tests.regression_tests.benchmark.test_model_performance ‑ test_performance[mercedes_benz_greener.gbm.yaml]

♻️ This comment has been updated with latest results.

@dennisrall dennisrall changed the title Improve ludwig feature dict [WIP] Improve ludwig feature dict Jan 21, 2024
@@ -65,13 +65,13 @@ def __iter__(self) -> None:
return iter(self.obj.keys())

def keys(self) -> List[str]:
return self.obj.keys()
return self.obj.key_list()
Copy link
Collaborator

@alexsherstinsky alexsherstinsky Jan 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dennisrall Do you think it might be simpler to retain the coding pattern of returning list(my_object.keys()) or list(my_object.items()), etc., instead of introducing a special additional methods item_list() and value_list()? It seems to me that doing so will be consistent with other cases in these Ludwig modules as well as general Python collection patterns. Thank you.

@@ -157,7 +158,7 @@ def get_name_from_module_dict_key(key: str, feature_name_suffix_length: int = FE
return name[:-feature_name_suffix_length]


class LudwigFeatureDict(torch.nn.Module):
class LudwigFeatureDict(torch.nn.Module, MutableMapping):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dennisrall Thank you for incorporating my previous suggestion -- I think that part looks clean now. Thank you for the idea and the implementation!

For this one, I am not sure if the benefits due to adding the MutableMapping subclassing justify taking the risk brought about the multiple inheritance. Do the test cover all the eventualities that might happen with this change?

Thank you. /cc @justinxzhao @arnavgarg1 @Infernaught

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, I was just playing around a bit.

I can't think of any problems about the multiple inheritance, but you know the code better than me😉

It is also possible to remove the MutableMapping inheritance and implement the other methods by hand. But I think this way it is a bit cleaner, if it doesn't cause any problems...



@pytest.fixture
def type_module() -> torch.nn.Module:
Copy link
Collaborator

@alexsherstinsky alexsherstinsky Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dennisrall This fixture and the to_module() one appear the same, and just instantiate the PyTorch Module() class. Without a docstring, it is a bit difficult to justify it having them. Thank you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your feedback. I added a docstring for both fixtures and also for the hash method of the LudwigFeatureDict

@dennisrall dennisrall changed the title [WIP] Improve ludwig feature dict Improve ludwig feature dict Jan 22, 2024
Copy link
Contributor

@justinxzhao justinxzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall PR LGTM, thanks for the change @dennisrall.

I pushed up a couple of commits, largely around fixing the CI.

It appears we'll need to bump up integration tests CI to torch==2.1.0, which we're absolutely ok also since that's what our Docker images are using now.

@mhabedank
Copy link
Collaborator

LGTM can be merged as soon as possible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants