-
Notifications
You must be signed in to change notification settings - Fork 716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update SchemaGen Executor to natively handle SequenceExample #5689
base: master
Are you sure you want to change the base?
Update SchemaGen Executor to natively handle SequenceExample #5689
Conversation
# Add tensor representations to handle SequenceExamples downstream. Still need correct Payload Format. | ||
tensor_representations = tensor_representation_util.InferTensorRepresentationsFromSchema(schema) | ||
tensor_representation_util.SetTensorRepresentationsInSchema(schema, tensor_representations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make it formatted in Pyink
Python formatter for consistency? (https://github.com/google/pyink)
ex. <=80 columns rule
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have attempted to do this, although perhaps the imports need changing for this same reason? I am unsure how one recommends to do this if the updated state is not desired, as I did not come up with the module names nor function names?
Thanks for your contribution! It looks beneficial for many users as well. |
This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days |
I have been out of the country but am still interested in pursuing this commit. |
Thanks for your contribution! To check if it doesn't break any tests and any existing pipelines, I would I appreciate again for your consideration. |
@lego0901 - I have a potential recommendation for this. What if during the encoding everything was turned into a sequence example (where appropriate) anyway? This should not affect the majority of users as they wouldn't have features to encode into the |
Hi @lego0901 Any update on this PR? Please. Thank you! |
@gbaned I have thought about this PR more and it can be improved even beyond here. Part of it is connecting other big query column types upstream, but fundamentally I don't see downstream negatives by putting everything as sequence examples. The changes will also need to be applied in example gen such that the output will be only sequence examples and not the option for one or another. The upstream modification is only regarding something like |
Yes, this looks good to me, either. But the problem is, we are updating the testing environment (Ubuntu 16.04 -> 20.04) and it is causing some build failures.. I wanted to make sure that this newly introduced feature will not break any test breakage but cannot confirm it before making every test green. I will accept this PR as soon as possible if the tests are not failing. Sorry and thank you. |
Hi @lego0901 / @AlexanderLavelle Any update on this PR? Please. Thank you! |
@gbaned no update on my side -- I have used this and confirm it works when intending to use SequenceExamples in TFX downstream components (Transform, Trainer) |
Hi @lego0901 Any update on this PR? Please. Thank you! |
Hi, is there any update on this PR? |
Hi @lego0901 Any update on this PR? Please. Thank you! |
Hi @lego0901 Any update on this PR? Please. Thank you! |
5 similar comments
Hi @lego0901 Any update on this PR? Please. Thank you! |
Hi @lego0901 Any update on this PR? Please. Thank you! |
Hi @lego0901 Any update on this PR? Please. Thank you! |
Hi @lego0901 Any update on this PR? Please. Thank you! |
Hi @lego0901 Any update on this PR? Please. Thank you! |
master into schemagenmod
Hi @lego0901 Any update on this PR? Please. Thank you! |
I would have to explicitly mention that this is blocked because of the TF docker issue, so the internal integration test is broken for a while. I cannot confirm this PR unless it is resolved, because this should be merged after verifying that it does not break any internal tests. |
Hi @lego0901 Thank you so much for the update. |
Hi @AlexanderLavelle, Could you possibly fetch the most recent code and run the tests? Please give an update on this PR. |
@keerthanakadiri Hello! I have brought in the new changes without conflicts. I am currently on PTO but can run tests in a week or early the following week |
While this PR does not address the need to specify
payload_format
inExampleGen
(orImportExampleGen
, etc), it does address the disconnect betweenExampleGen
,SchemaGen
, andTransform
by specifying tensor representations inSchemaGen
. This should be a positive benefit to most users as thetensor representations
will become a natural piece of the pipeline.This addresses 5520, 4714, and, to a lesser extent, 5361.