Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] HttpCollectImpl XML parsing assumes UTF-8 #2852

Open
1 task done
pjfanning opened this issue Dec 2, 2024 · 1 comment
Open
1 task done

[BUG] HttpCollectImpl XML parsing assumes UTF-8 #2852

pjfanning opened this issue Dec 2, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@pjfanning
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Document document = db.parse(new ByteArrayInputStream(resp.getBytes(StandardCharsets.UTF_8)));

If you have a String, you don't need to convert to byte array (which is almost a waste of memory).

DocumentBuilder has a parse(InputSource) method.
https://docs.oracle.com/javase/8/docs/api/javax/xml/parsers/DocumentBuilder.html#parse-org.xml.sax.InputSource-

InputSources can be constructed to wrap StringWriters that wrap the String.

Expected Behavior

Don't convert Strings to byte arrays unnecessarily wasting memory and causing parse issues. Imagine if the XML has an XML declaration that has an encoding that is not UTF-8. If you already have the String, the parser will ignore the value. If you convert to a byte array, the parser will use the XML encoding value but you have explicitly converted to UTF-8 in your code so these encodings may not match.

Steps To Reproduce

No response

Environment

HertzBeat version(s): latest

Debug logs

No response

Anything else?

No response

@pjfanning
Copy link
Contributor Author

The underlying issue is more that you convert to a String in the first place.

// todo This code converts an InputStream directly to a String. For large data in Prometheus exporters,
// this could create large objects, potentially impacting JVM memory space significantly.
// Option 1: Parse using InputStream, but this requires significant code changes;
// Option 2: Manually trigger garbage collection, similar to how it's done in Dubbo for large inputs.
String resp = EntityUtils.toString(response.getEntity(), StandardCharsets.UTF_8);

Using an InputStream or a byte array should be more more efficient than a String. It would definitely not be worse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Development

No branches or pull requests

1 participant