-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyTorch migration: Remove tensorflow components, add FATE estimators #164
Conversation
Already looks very clean - well done! |
Just as a little status update, since this has been going on for a while: I think this is turning out quite nicely. I have implemented FETA ranking and (nearly, not using the proper loss function yet) FETA discrete choice. That shows flexibility on one axis (result type). I plan to also implement the same for FATE to show the flexibility on the other axis and finish the proof of concept. At that point we could evaluate and see where to take it from there. So in summary, things are moving along but are not quite ready for review/discussion yet. Hopefully soon-ish. |
Okay, I think this is sufficient as a proof of concept now. I have implemented FATE and FETA, each in the ranking and discrete choice variant. I replaced a lot of the inheritance in the current tensorflow implementation with composition. I have split the code into "scoring modules" and estimators. The scoring modules are themselves composed of smaller modules, which makes them easier to reuse/understand/test. I have based the estimator implementation on skorch, which takes care of a lot of the boilerplate for us. We no longer have to care about training loops, instantiating optimizers or passing the parameters to uninitialized classes. We get #116 basically for free. The actual "heavy lifing" of the computation (the pairwise utilities) is disentangled from the FETA/FATE architecture (the "data flow" part), so its easy to modify or replace. For now its just a simple 1-layer linear network. This decomposition of scorer/estimator/utility removes a lot of duplication. It would be very easy to add a new scorer (for example based on graph neural networks) and "throw" it at the existing Ranking/Discrete choice estimators. It would also be very easy to derive a new utility function architecture and "throw" that at the FATE module. If you want to look at the implementation, here are the most interesting files:
What do you think @kiudee? There are still things to improve of course, but I think its sufficient as a proof of concept. |
Also CC @prithagupta if you are interested in this. |
poc/modules/scoring.py
Outdated
instances | ||
) | ||
pairs = torch.cat((instances, context_per_object), dim=-1) | ||
utilities = self.pairwise_utility_module(pairs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very clean decomposition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
poc/modules/scoring.py
Outdated
# TODO use a more powerful pairwise utility module | ||
def __init__(self, n_features, pairwise_utility_module=PairwiseLinearUtility): | ||
super().__init__() | ||
self.mean_aggregated_utilty = MeanAggregatedUtility( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle we can even think about making this modular. One could use different aggregation functions here or even a learned aggregation operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that would be interesting :) The composition architecture should make experiments like that much easier.
What is your general verdict @kiudee? Should I continue down this path, implementing more of the existing learners and functionality and eventually replacing the current implementation? Or rather try something else? |
6911866
to
e08acfc
Compare
I think default values for internal functions just hinder understanding. Changed the parameter names to be less domain specific, since we are just talking about a point in the ball for the purposes of this function. Since this is an internal function, we can require an already initialized random state. Result of this discussion / explanation: kiudee#164 (comment)
Thereby fixing a bug when the number of instances is not a multiple of 10. Result of this discussion kiudee#164 (comment)
I think default values for internal functions just hinder understanding. Changed the parameter names to be less domain specific, since we are just talking about a point in the ball for the purposes of this function. Since this is an internal function, we can require an already initialized random state. Result of this discussion / explanation: kiudee#164 (comment)
Thereby fixing a bug when the number of instances is not a multiple of 10. Result of this discussion kiudee#164 (comment)
Another status update: I'm experimenting with experiments. We should be able to reproduce the experiments of the main papers with the new implementation, and I'd like to be able to do that in an easily reproducible way (that could possibly be repeated on each release). I'm trying to use Sacred for the purpose. I'm abusing the "named configuration" system a bit, but currently you can do things like
you can pick "named configurations" for an estimator and a dataset. You can then overwrite all parameters on the command line. Sacred will run the experiment, store everything that is needed to reproduce it and also store the results in a database: |
Some more progress: I've added some metrics and played with the experiments a bit. Here I was trying to see how far I could push the current feta implementation with its defaults and just 1000 pareto instances (which was further than expected):
I also created an upstream PR for the Sacred logger for skorch: skorch-dev/skorch#725 |
poc/experiment.py
Outdated
"n_instances": int(1e5), | ||
} | ||
dataset_type = "variable_choice" | ||
# TODO set cross-validation parameters as in the paper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also not sure if I understand the training and validation strategy of "Leaning Choice Functions" correctly. I understand that it uses and outer 5-fold cross validation for hyper-parameter optimization, but I'm not sure how the "inner" validation works. From Table 3 I would guess that it works like this:
- Generate (110000 / 4) * 5 = 137,500 instances (to get that 100,000 + 10,000 split in each cross-validation step).
- Split this into 5 folds of 27,500 instances each.
- For each fold F (outer cross-validation loop):
- Combine the other 4 folds into one dataset of 110,000 instances.
- Split this into training (100,000) and test (10,000) data
- For each set of hyperparameters
- Train a model, test it on the test data.
- Validate the best (according to test data) model on fold F
- Report the average performance of the outer cross-validation.
However I'm almost sure that is incorrect 😄 Could you clarify @kiudee?
Python 3.7 is not officially supported anymore. Python 3.9 is released already, but let's update to 3.8 first.
In preparation for the pytorch migration.
In preparation of adding new entries to the list.
I have fixed the copy and paste issue that you found. Just for completeness, I'll summarize the results of our private discussions here too:
The PR is ready for review again. Edit: I forgot to mention the module names. We also discussed alternatives for the names of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great clean commit history. Looks ready to be merged.
Thank you for the reviews :) |
This is part of the ongoing pytorch migration. We will use skorch as the basis for our pytorch estimators. That will make it easier to be compliant with the scikit-learn estimator API. Ranking and (general/discrete) choice estimators are often based on some sort of scoring. The task specific estimators make it easy to derive concrete estimators from a scoring module.
This adds a scoring module and the derived estimators for the FATE approach. The architecture is modular, so it should be easy to experiment with new ways to put the estimators together. This is a big commit. Multiple smaller ones that add the separate components (some of which are structural or can be useful outside of FATE and therefore could be considered features on their own) would probably have been better. Splitting it up now would take more time and is not worth it in this case though.
This simplifies interchangeable use of pytorch estimators and other estimators.
The "linear" implementations have been removed. The existing estimators do not expect `epochs` or `validation_split` parameters. The `verbose` parameter is accepted by some estimators, but defaults to `False` and is not expected by any of the ranking or discrete choice estimators.
The configuration is based on the configuration of the old (tensorflow based) fate estimators in the tests. The tensorflow tests used a 10% validation split, but still verified the performance in-sample. The validation was not actually used. Therefore I haven't kept that behavior. The performance isn't the same. Especially the performance on the choice task seems worse if we trust the test results. We shouldn't read too much into that yet. The test is mostly for basic functionality and not a reliable performance indicator. The sample size is small.
The binder logo (badge.svg vs badge_logo.svg) differs between the two files, but either should be good.
There is now a pytorch implementation of the FATE estimators.
This is similar to the "optimizer_common_args" dictionary that used to exist. This version contains skorch-specific arguments, which also includes the train split and the number of epochs. There are only the FATE based estimators now, but this would get repetitive when the other approaches are included again.
I wanted to run the checks and lints once more before merging and noticed some formatting issues. I did not run Please take another look. |
Thanks again 🚀 |
Description
See this comment for a description of the current status.
Motivation and Context
Tensorflow 1 is deprecated and we need to move away from it. This PR is an attempt to evaluate pytorch as an alternative.
For now I don't try to fit the existing API (at least not yet).How Has This Been Tested?
Lints & tests.
Does this close/impact existing issues?
Types of changes
Checklist: