Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New rule: ISO-8859-1 characters not compatible with UTF-8 should be escaped to keep compatibility with Java 9 default encoding switch #75

Open
arend-von-reinersdorff opened this issue Feb 19, 2017 · 10 comments

Comments

@arend-von-reinersdorff
Copy link

arend-von-reinersdorff commented Feb 19, 2017

Java 9 will switch the default property encoding from ISO-8859-1 to UTF-8:
http://openjdk.java.net/jeps/226

This will lead to garbled input if a .properties file for Java 8 or earlier is read by Java 9 in case it
contains non-ASCII, non-escaped characters. Eg:
admin.name=Jörg Schäfer

A .properties file that should be read by Java 9 and Java 8 or earlier should escape all non-ASCII characters.

@racodond
Copy link
Owner

Hi @arend-von-reinersdorff,

Thanks for the info and the link!
I'll create such a rule for the next release.

David

@racodond racodond added this to the 2.6 milestone Feb 27, 2017
@racodond
Copy link
Owner

Depends upon #76

@racodond racodond changed the title Check compatibility with change of default property encoding in Java 9 New rule: ISO-8859-1 characters not compatible with UTF-8 should be escaped to keep compatibility with Java 9 default encoding switch Feb 27, 2017
@racodond
Copy link
Owner

racodond commented Feb 27, 2017

@arend-von-reinersdorff: What about the following rule description?

<p>
    Java 9 expects properties file to be encoded in UTF-8 instead of ISO-8859-1. Even if Java 9 provides some fallback
    mechanisms to ISO-8859-1 while loading properties, in some corner cases, you might face unexpected behaviors for
    ISO-8859-1 characters not matching UTF-8 characters (meaning characters whose code points are over U+007F). For
    instance, instead of <code>Jörg</code>, <code>J�rg</code> might be displayed. To make sure to avoid any display
    issue, either:
</p>
<ul>
    <li>Escape all characters whose code points are over U+007F with Unicode escapes (<code>\uXXXX</code>)</li>
    <li>Or explicitly load properties files with ISO-8859-1 encoding</li>
</ul>
<p>
    This rule applies only when `sonar.jproperties.sourceEncoding` is set to `ISO-8859-1' (default value) and it raises
    an issue each time a character whose code point is over U+007F is found.
</p>


<h2>Noncompliant Code Example</h2>
<pre>
my.name: Jörg
</pre>

<h2>Compliant Solution</h2>
<pre>
my.name: J\u00f6rg
</pre>

<h2>See</h2>
<ul>
    <li><a target="_blank"
           href="https://docs.oracle.com/javase/9/intl/internationalization-enhancements-jdk-9.htm#JSINT-GUID-5ED91AA9-B2E3-4E05-8E99-6A009D2B36AF">Oracle
        - Internationalization Enhancements in JDK 9</a></li>
    <li><a target="_blank" href="http://openjdk.java.net/jeps/226">OpenJDK - JEP 226: UTF-8 Property Files</a></li>
</ul>

@arend-von-reinersdorff
Copy link
Author

Great work, and very nice description. Thanks a lot :-)

@racodond
Copy link
Owner

You're welcome! Here's a snapshot to test: https://github.com/racodond/sonar-jproperties-plugin/releases/tag/%2375

Your feedback is more than welcome!

@arend-von-reinersdorff
Copy link
Author

I tried to trigger the new issue but didn't manage. My setup:

  • SonarQube 5.6.6 LTS
  • default local database
  • analyzed with Maven

I was able to trigger another issue in my test.properties file but not this new one.

At first I used UTF-8 as Maven project encoding and ISO-8859-1 as encoding for the property file (this should be the normal case). This caused a warning on analysis:
[WARNING] Invalid character encountered in file [...]\src\main\java\test.properties at line 5 for encoding UTF-8. Please fix file content or configure the encoding to be used using property 'sonar.sourceEncoding'.
Also the non-ASCII characters were garbled when viewing the file in the SonarQube server view.

When I changed the Maven project encoding to ISO-8859-1 (ugly workaround) the warning disappeared but the issue was still not triggered. Non-ASCII characters looked fine in the SonarQube server view.

Unrelated problems in testing this:

  • The Properties plugin is not in the SonarQube update center and not in the Sonarqube version compatibility matrix. README.md should be updated.
  • Property file was ignored in src/main/resources which would be the default location in Maven. I had to put it in src/main/java

@racodond
Copy link
Owner

racodond commented Mar 4, 2017

Hi @arend-von-reinersdorff,

Thanks for your feedback!

I was able to trigger another issue in my test.properties file but not this new one

It works fine on my side with your settings with the following project sample:
test.zip

My apologies to ask :-):

Can you try again with my sample project?

At first I used UTF-8 as Maven project encoding and ISO-8859-1 as encoding for the property file (this should be the normal case). This caused a warning on analysis:
[WARNING] Invalid character encountered in file [...]\src\main\java\test.properties at line 5 for encoding UTF-8. Please fix file content or configure the encoding to be used using property 'sonar.sourceEncoding'.
Also the non-ASCII characters were garbled when viewing the file in the SonarQube server view.

Of course, the proper settings should be:

sonar.sourceEncoding=UTF-8
sonar.jproperties.sourceEncoding=ISO-8859-1

But, currently, no language plugin seems to support files with different encodings. I asked about it here. Unfortunately, it is likely that SonarSource doesn't answer the thread as they don't really welcome language plugins from the community. I'll try to keep investigating to find a workaround when I have some time.

The Properties plugin is not in the SonarQube update center and not in the Sonarqube version compatibility matrix. README.md should be updated.

README file updated

Property file was ignored in src/main/resources which would be the default location in Maven. I had to put it in src/main/java

This is related to the SonarQube Maven plugin that only looks for files in src/main/java. There's a ticket to also automatically take into account files in src/main/resources. See https://jira.sonarsource.com/browse/MSONAR-123
For now, you have to set the sonar.sources property in your pom file (see my project sample).

David

@arend-von-reinersdorff
Copy link
Author

You are right, the rule was not activated, sorry.
It works very nicely, thank you very much :-)

@racodond
Copy link
Owner

racodond commented Mar 4, 2017

Good news!
I'll try to find a solution about the encoding before an official release.

@racodond racodond reopened this Mar 4, 2017
@racodond
Copy link
Owner

racodond commented Mar 4, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants