Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix yaml dump in string templates #32

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion src/airflow_declarative/schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,10 @@ def dump(schema, *args, **kwargs):
"""
kwargs.setdefault("default_flow_style", False)
kwargs.setdefault("default_style", "")
return yaml.dump(schema, Dumper=Dumper, *args, **kwargs)

# yaml.dump always end the string with '\n...\n' even if explicit_end is False
# so just replace it
return yaml.dump(schema, Dumper=Dumper, *args, **kwargs).replace("\n...\n", "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ellipsis in yaml is an end-of-document mark: https://yaml.org/spec/1.2/spec.html#id2760395

I guess it is safe to remove it from the end, and I suppose it shouldn't ever be rendered in the middle (which would mean that dump has produced multiple yaml documents with a single dump call, which is rather unexpected). So probably this should be good as is.

But I would propose to change that replace so it would strip the ellipsis only from the end of the document. Something like this:

yaml_doc = yaml.dump(schema, Dumper=Dumper, *args, **kwargs).rstrip("\n")
yaml_doc = re.sub(r'\.\.\.$', '', yaml_doc, flags=re.MULTILINE)
return yaml_doc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right solution since yaml has explicit_end parameter and the document-end is an optional marker. There's something going wrong for it occur. Need to investigate what and why.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solving this via pyyaml would definitely be better, but it might be tricky.

This commit seems to be the reason: yaml/pyyaml@5413848

But simply setting open_ended = False doesn't help, because it then gets set back to True. So the following seems to work:

diff --git a/src/airflow_declarative/schema.py b/src/airflow_declarative/schema.py
index 5f2ca6e..1517b69 100644
--- a/src/airflow_declarative/schema.py
+++ b/src/airflow_declarative/schema.py
@@ -80,10 +80,18 @@ def dump(schema, *args, **kwargs):

     # yaml.dump always end the string with '\n...\n' even if explicit_end is False
     # so just replace it
-    return yaml.dump(schema, Dumper=Dumper, *args, **kwargs).replace("\n...\n", "")
+    return yaml.dump(schema, Dumper=Dumper, *args, **kwargs)


 class Dumper(yaml.SafeDumper):
+    @property
+    def open_ended(self):
+        return False
+
+    @open_ended.setter
+    def open_ended(self, value):
+        pass
+
     def ignore_aliases(self, data):
         return True

Although I'm wondering if this solution is better than the regexp 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely better than blind replace of data we know nothing about. However, may be it's we are who uses yaml in wrong way here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know pretty well that an ellipsis at the very end of a YAML document is an explicit end mark -- that is in the specs. So this is not a blind replacement.

I don't think it's a matter of incorrect usage. They set open_ended to True for a root node, effectively forcing the trailing ellipsis. And they don't seem to have that explicit_end argument covered with tests, so I would assume that it is broken in upstream.

https://github.com/yaml/pyyaml/blob/5413848f2ba250cc2c70f0192893a4a9626a8209/lib/yaml/emitter.py#L1075-L1076

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://stackoverflow.com/a/56988010 others do strip the ellipsis manually as well.



class Dumper(yaml.SafeDumper):
Expand Down
33 changes: 33 additions & 0 deletions tests/dags/good/template_string_param.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#
# Copyright 2017, Rambler Digital Solutions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The year should be 2019

#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#


dags:
dag:
args:
start_date: 2017-07-27
schedule_interval: 1d
do:
- operators:
some_{{ item.name }}_operator:
callback: tests.utils:Operator
callback_args:
param: 'some_name_for_operator{{ item.id }}.xml'
with_items:
- id: 1
name: first
- id: 2
name: second