Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update SchemaGen Executor to natively handle SequenceExample #5689

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

AlexanderLavelle
Copy link
Contributor

While this PR does not address the need to specify payload_format in ExampleGen (or ImportExampleGen, etc), it does address the disconnect between ExampleGen, SchemaGen, and Transform by specifying tensor representations in SchemaGen. This should be a positive benefit to most users as the tensor representations will become a natural piece of the pipeline.

This addresses 5520, 4714, and, to a lesser extent, 5361.

@gbaned gbaned self-assigned this Feb 1, 2023
@gbaned gbaned requested a review from roseayeon February 1, 2023 16:53
@roseayeon roseayeon requested a review from lego0901 February 7, 2023 01:52
Comment on lines 94 to 96
# Add tensor representations to handle SequenceExamples downstream. Still need correct Payload Format.
tensor_representations = tensor_representation_util.InferTensorRepresentationsFromSchema(schema)
tensor_representation_util.SetTensorRepresentationsInSchema(schema, tensor_representations)
Copy link
Member

@lego0901 lego0901 Feb 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make it formatted in Pyink Python formatter for consistency? (https://github.com/google/pyink)
ex. <=80 columns rule

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have attempted to do this, although perhaps the imports need changing for this same reason? I am unsure how one recommends to do this if the updated state is not desired, as I did not come up with the module names nor function names?

@lego0901
Copy link
Member

Thanks for your contribution! It looks beneficial for many users as well.
We are checking if your change doesn't affect other users' existing pipelines or tests. Please be patient to be completely reviewed.

@github-actions
Copy link
Contributor

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

@github-actions github-actions bot added the stale label Mar 12, 2023
@AlexanderLavelle
Copy link
Contributor Author

I have been out of the country but am still interested in pursuing this commit.

@google-ml-butler google-ml-butler bot removed the stale label Mar 13, 2023
@lego0901
Copy link
Member

lego0901 commented Mar 14, 2023

Thanks for your contribution!
The reason we took more cautious approach about this is it changes the file payload, and any related bugs are not trivial to be caught by unit tests.

To check if it doesn't break any tests and any existing pipelines, I would
(1) First approve this PR
(2) Maybe, merge this PR into the master
(3) Run some internal continuous tests including e2e pipeline runs
(4, if problem appears) Rollback before the next releases, otherwise keep this change!

I appreciate again for your consideration.

@AlexanderLavelle
Copy link
Contributor Author

@lego0901 - I have a potential recommendation for this. What if during the encoding everything was turned into a sequence example (where appropriate) anyway? This should not affect the majority of users as they wouldn't have features to encode into the feature_lists of tf.train.SequenceExample

@gbaned
Copy link
Contributor

gbaned commented Jul 4, 2023

Hi @lego0901 Any update on this PR? Please. Thank you!

@AlexanderLavelle
Copy link
Contributor Author

@gbaned I have thought about this PR more and it can be improved even beyond here. Part of it is connecting other big query column types upstream, but fundamentally I don't see downstream negatives by putting everything as sequence examples. The changes will also need to be applied in example gen such that the output will be only sequence examples and not the option for one or another. The upstream modification is only regarding something like if isinstance(pyval, FeatureLists) ...

@lego0901
Copy link
Member

lego0901 commented Jul 5, 2023

Yes, this looks good to me, either. But the problem is, we are updating the testing environment (Ubuntu 16.04 -> 20.04) and it is causing some build failures.. I wanted to make sure that this newly introduced feature will not break any test breakage but cannot confirm it before making every test green. I will accept this PR as soon as possible if the tests are not failing.

Sorry and thank you.

@gbaned
Copy link
Contributor

gbaned commented Oct 27, 2023

Hi @lego0901 / @AlexanderLavelle Any update on this PR? Please. Thank you!

@AlexanderLavelle
Copy link
Contributor Author

@gbaned no update on my side -- I have used this and confirm it works when intending to use SequenceExamples in TFX downstream components (Transform, Trainer)

@gbaned
Copy link
Contributor

gbaned commented Nov 7, 2023

Hi @lego0901 Any update on this PR? Please. Thank you!

@jsalkey
Copy link

jsalkey commented Nov 8, 2023

Hi, is there any update on this PR?

@gbaned
Copy link
Contributor

gbaned commented Nov 15, 2023

Hi @lego0901 Any update on this PR? Please. Thank you!

@gbaned
Copy link
Contributor

gbaned commented Nov 27, 2023

Hi @lego0901 Any update on this PR? Please. Thank you!

5 similar comments
@gbaned
Copy link
Contributor

gbaned commented Dec 15, 2023

Hi @lego0901 Any update on this PR? Please. Thank you!

@gbaned
Copy link
Contributor

gbaned commented Dec 27, 2023

Hi @lego0901 Any update on this PR? Please. Thank you!

@gbaned
Copy link
Contributor

gbaned commented Jan 10, 2024

Hi @lego0901 Any update on this PR? Please. Thank you!

@gbaned
Copy link
Contributor

gbaned commented Feb 23, 2024

Hi @lego0901 Any update on this PR? Please. Thank you!

@gbaned
Copy link
Contributor

gbaned commented Apr 2, 2024

Hi @lego0901 Any update on this PR? Please. Thank you!

@gbaned
Copy link
Contributor

gbaned commented Apr 19, 2024

Hi @lego0901 Any update on this PR? Please. Thank you!

@lego0901
Copy link
Member

I would have to explicitly mention that this is blocked because of the TF docker issue, so the internal integration test is broken for a while. I cannot confirm this PR unless it is resolved, because this should be merged after verifying that it does not break any internal tests.

@gbaned
Copy link
Contributor

gbaned commented Apr 19, 2024

I would have to explicitly mention that this is blocked because of the TF docker issue, so the internal integration test is broken for a while. I cannot confirm this PR unless it is resolved, because this should be merged after verifying that it does not break any internal tests.

Hi @lego0901 Thank you so much for the update.

@keerthanakadiri
Copy link

Hi @AlexanderLavelle, Could you possibly fetch the most recent code and run the tests? Please give an update on this PR.

@AlexanderLavelle
Copy link
Contributor Author

@keerthanakadiri Hello! I have brought in the new changes without conflicts. I am currently on PTO but can run tests in a week or early the following week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants