Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid incorrectly including xml header in xslt output on JRuby #2903

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

cbasguti
Copy link
Contributor

What problem is this PR intended to solve?

  • In the updated function, the line props.setProperty(OutputKeys.METHOD, org.apache.xml.serializer.Method.UNKNOWN); was changed to props.setProperty(OutputKeys.METHOD, "html");.
  • This change ensures that the METHOD property is set to "html" instead of the unknown method. This modification aims to address issue JRuby nokogiri incorrectly include xml declaration for html transformation #1430 by explicitly specifying the method as "html" during serialization.

Have you included adequate test coverage?

A new test file named test_xml_header.rb was added .

Does this change affect the behavior of either the C or the Java implementations?

The modification made to the Java implementation was to exclude the XML declaration tag () from the output. This was achieved by adjusting the code in the serialize function to remove the inclusion of the XML declaration. The purpose was to align the output with the expected format, where the XML declaration should not be present.

@stevecheckoway
Copy link
Contributor

Html is not obviously the correct method when none is specified. If this is the correct default, I think you should add a comment explaining why it's correct.

@stevecheckoway
Copy link
Contributor

E.g., does this make the jruby implementation match the C implementation? (I'm on my phone so I cannot easily check.)

@stevecheckoway
Copy link
Contributor

After digging into this a bit, I don't think this is the correct fix. The XSLT standard specifies how the default method should be determined—either XML or HTML—when none is specified by the stylesheet.

For some reason, this is not being correctly detected by Xalan. (The code that's supposed to do that is here, I think.)

If I modify the code in #1430 slightly to call #transform and print the root element's name and namespace, it says its name is html and it has no namespace. I'm not sure why Xalan is using the XML output method rather than the HTML output method

Here's the code.

input_xml = <<-EOS
<?xml version="1.0" encoding="utf-8"?>
<report>
  <title>My Report</title>
</report>
EOS

input_xsl = <<-EOS
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/">
    <html>
      <head>
        <title><xsl:value-of select="report/title"/></title>
      </head>
      <body>
        <h1><xsl:value-of select="report/title"/></h1>
      </body>
    </html>
  </xsl:template>
</xsl:stylesheet>
EOS

require 'nokogiri'

xml = Nokogiri::XML(input_xml)
xsl = Nokogiri::XSLT(input_xsl)
html = xsl.transform(xml)

puts(html.root.name)
puts(html.root.namespace || "no namespace")

@cbasguti
Copy link
Contributor Author

Thank you for your feedback and for providing additional insights into the issue! Based on your comments, it appears that determining the correct default method (XML or HTML) when it is not explicitly specified can be a complex matter. As you said, it seems that Xalan is not correctly detecting the default method according to the XSLT standard.

Considering this, I propose handling the default method determination within the serialize method itself. By doing so, we can ensure that the output format aligns with the expected behavior, especially when the method is not explicitly specified. Like this:

@flavorjones
Copy link
Member

I'm going to close this, and we can continue the conversation at #1430

@flavorjones
Copy link
Member

Oh, apologies, I see you pushed additional changes that are trying to implement what Steve is describing ... reopening

@flavorjones flavorjones reopened this Jun 18, 2023
@flavorjones
Copy link
Member

@cbasguti I appreciate the time and effort you've put into trying to fix this issue. However, I'm not confident that we understand what's going on enough to start working around it by writing our own Java code.

If it's a bug in Xalan, we should report it upstream and/or submit a patch there.

If it's not a bug in Xalan, then it may be that we're using the library incorrectly and so the fix to Nokogiri should then be a usage fix to how we're calling the libraries.

I'll leave this open for now, but let's continue the conversation in #1430.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants