Refactor handling of idna domain names #55

davidklaftenegger · 2021-05-29T09:06:18Z

This series of commits reworks how translation to IDNA domain names is done. It differs in functionality in only two points:
a) the idna module is no longer optional
b) on error, no attempt is made to continue operation.

While a) could be patched away, I don't think it is worth maintaining separate code paths that are fragile.
As for b), I consider the previous behaviour a bug, where it would attempt to use ACME with the non-translated domain name after printing that an error occurred.

Handling for empty lists is not needed, idna_convert always returns a full list of idna-converted domain names.

The domaintranslations list already contains all the information, and the mapping was only used to re-construct the list it is constructed from.

This change makes the use of idna non-optional, and fixes what I consider a bug: When idna translation fails, an error was printed, but operation would still continue. The new version instead re-raises the exception to force the program to abort at this point.

The original names are already known in order, so there is no need to return tuples here.

moepman

Maybe it would make sense to print both the human readable domain and the idna domain. That should make it easier to see, what is actually requested from the API as well as what a user would normally enter in the config and/or browser.

moepman · 2021-05-29T15:56:00Z

It seems that either not all code paths have been updaed or there is still a bug. When running the code from this version with an example idna domain I receive the following error:

Generating CSR for ['bläh.example.com'']
Error: Certificate issue/renew failed
       Traceback (most recent call last):
         File "/usr/lib/python3/dist-packages/acertmgr/__init__.py", line 176, in main
           cert_get(config)
         File "/usr/lib/python3/dist-packages/acertmgr/__init__.py", line 58, in cert_get
           cr = tools.new_cert_request(settings['domainlist_idna'], key, must_staple)
         File "/usr/lib/python3/dist-packages/acertmgr/tools.py", line 108, in new_cert_request
           names[0].decode('utf-8') if getattr(names[0], 'decode', None) else
       TypeError: 'map' object is not subscriptable

using map directly is sufficient.

davidklaftenegger · 2021-05-29T23:06:50Z

Maybe it would make sense to print both the human readable domain and the idna domain. That should make it easier to see, what is actually requested from the API as well as what a user would normally enter in the config and/or browser.

Sounds reasonable, but probably is beyond the scope of this PR. Until such a change is implemented, would you prefer the idna or the human-readable version in the log messages?

Kishi85 · 2021-05-30T14:37:30Z

idna_convert is currently used as an additional input sanitizer and does only touch domains that really need to be changed to an IDNA pattern (covered by the any(ord(c) > 128 part). Which makes it transparent in all other codepaths.

This PR also starts storing domain names twice, which is (unless it is for performance reasons, which it is obviously not in this use case) dislikable because it introduces additional complexity and can be error-prone if not handled correctly in all (future) changes.

But I agree that the current code might need some additional comments to really understand what is going on.

davidklaftenegger · 2021-05-30T15:03:50Z

idna_convert is currently used as an additional input sanitizer and does only touch domains that really need to be changed to an IDNA pattern (covered by the any(ord(c) > 128 part). Which makes it transparent in all other codepaths.

but it does not sanitize anything?

This PR also starts storing domain names twice, which is (unless it is for performance reasons, which it is obviously not in this use case) dislikable because it introduces additional complexity and can be error-prone if not handled correctly in all (future) changes.

I strongly disagree: the previous code stores the domain names three times (once in the list, twice in the translation returned from idna_convert). I consider this a step towards less complexity.

moepman · 2021-05-30T15:50:17Z

I think'd I'd prefer the solution @davidklaftenegger proposed to me in private: only translate the domain name where needed. This would also remove the problem of stroing data twice.

With regards to printing/logging: Since this PR is not fixing any bugs I'd rather wait for you (or someone else) to implement a logic similar to this: if the domain needs translation: human_readable_domain_name (translated_domain_name) else: domain_name.

davidklaftenegger · 2021-05-30T15:59:46Z

I think'd I'd prefer the solution @davidklaftenegger proposed to me in private: only translate the domain name where needed. This would also remove the problem of stroing data twice.

This solution appears possible to me, but will require more changes in other places of the code.

Kishi85 · 2021-05-30T17:07:46Z

idna_convert is currently used as an additional input sanitizer and does only touch domains that really need to be changed to an IDNA pattern (covered by the any(ord(c) > 128 part). Which makes it transparent in all other codepaths.

but it does not sanitize anything?

Sanitizing is probably the wrong word, but it converts unusable unicode (if present) to the proper format for the certificate request

This PR also starts storing domain names twice, which is (unless it is for performance reasons, which it is obviously not in this use case) dislikable because it introduces additional complexity and can be error-prone if not handled correctly in all (future) changes.

I strongly disagree: the previous code stores the domain names three times (once in the list, twice in the translation returned from idna_convert). I consider this a step towards less complexity.

Outside configuration it is only stored once in domainlist, the only place where it is used multiple times is within configuration. A simple proposal for config processing with a single domain list + a map storing the mapped domains only inlcuding the essential changes by @davidklaftenegger in this PR would be to replace lines 92-95 with the follwing and update the parts in configurion of challange handlers using domaintranslation accordingly:

# Convert unicode to IDNA domains
config['domainlist_idnamap'] = {}
    for idx in range(0,len(config['domainlist'])):
        if any(ord(c) >= 128 for c in config['domainlist'][idx]):
            domain_human = config['domainlist'][idx]
            domain_idna = idna_convert(domain_human)
            config['domainlist'][idx] = domain_idna  # Update domain with idna counterpart
            config['domainlist_idnamap'][domain_idna] = domain_human  # Store original domain for reference

The map could be then also used for any log messages with a simple domain if domain not in config['domainlist_idnamap'] else '{} ({}).format(domain, config['domainlist_idnamap'][domain]).

This proposal combined with the current state of the PR would be a good simplification of the currently too complex inda_convert funtion.

EDIT: Had to update this before being able to sleep ;) Sorry for the confusion on the earlier iteration of this comment.

Kishi85 · 2021-05-31T07:37:47Z

Outside configuration it is only stored once in domainlist, the only place where it is used multiple times is within configuration. A simple proposal for config processing with a single domain list + a map storing the mapped domains only inlcuding the essential changes by @davidklaftenegger in this PR would be to replace lines 92-95 with the follwing and update the parts in configurion of challange handlers using domaintranslation accordingly:
# Convert unicode to IDNA domains
config['domainlist_idnamap'] = {}
    for idx in range(0,len(config['domainlist'])):
        if any(ord(c) >= 128 for c in config['domainlist'][idx]):
            domain_human = config['domainlist'][idx]
            domain_idna = idna_convert(domain_human)
            config['domainlist'][idx] = domain_idna  # Update domain with idna counterpart
            config['domainlist_idnamap'][domain_idna] = domain_human  # Store original domain for reference

For clarification:
config['domainlist'] keeps all domains necessary for processing (IDNA ones will replace the human readble ones in the process) in one place and config['domainlist_idnamap'] contains only the mapped domains with their human readable counterpart. This can be used to fix the challange handler setup as well as the already mentioned log message modifications (where applicable). Challenge handler config could look something like this (Replacement for lines 164-166):

        # Update handler config with more specific values (use original names for translated unicode domains)
        specificcfgs = [x for x in handlerconfigs if 'domain' in x and x['domain'] == config['domainlist_idnamap'].get(domain, domain)]

This way my propsal would confine all changes in this PR to tools.py and configuration.py without changes to other parts of the code. Except for logging things, which would have to be added where necessary. Just trying to keep it simple.

Kishi85 · 2021-06-21T20:08:41Z

A little thing i've noticed in regards to this:

a) the idna module is no longer optional

cryptography-0.6 does not pull in idna, so this will only work if you bump the minimum required cryptography version up to a version that does. I've not had the time to figure out which version this would be. The minium valid version for this change also has to meet our goal of supporting versions used distributions that still have mainstream support.

EDIT: Typos and grammar.

davidklaftenegger added 6 commits May 28, 2021 03:31

idna_convert always returns full list

5658a0c

Handling for empty lists is not needed, idna_convert always returns a full list of idna-converted domain names.

remove superfluous dictionary

3286d45

The domaintranslations list already contains all the information, and the mapping was only used to re-construct the list it is constructed from.

refactor idna_convert

e14dc1c

This change makes the use of idna non-optional, and fixes what I consider a bug: When idna translation fails, an error was printed, but operation would still continue. The new version instead re-raises the exception to force the program to abort at this point.

rename settings: domainlist -> domainlist_idna

2daf8bb

remove domaintranslations from settings

39d8214

refactor idna_convert: only return converted names

18d3566

The original names are already known in order, so there is no need to return tuples here.

moepman reviewed May 29, 2021

View reviewed changes

davidklaftenegger added 2 commits May 29, 2021 21:47

remove list wrapper for idna_convert

60d0dd1

using map directly is sufficient.

use human-readable domain names in output

fc6913f

davidklaftenegger force-pushed the refactor_idna branch from 0e301c3 to fc6913f Compare May 29, 2021 19:47

Kishi85 mentioned this pull request Jun 21, 2021

configuration: Simplify too complex IDNA conversion #57

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor handling of idna domain names #55

Refactor handling of idna domain names #55

davidklaftenegger commented May 29, 2021

moepman left a comment

moepman commented May 29, 2021

davidklaftenegger commented May 29, 2021

Kishi85 commented May 30, 2021

davidklaftenegger commented May 30, 2021

moepman commented May 30, 2021

davidklaftenegger commented May 30, 2021

Kishi85 commented May 30, 2021 •

edited

Loading

Kishi85 commented May 31, 2021

Kishi85 commented Jun 21, 2021 •

edited

Loading

Refactor handling of idna domain names #55

Are you sure you want to change the base?

Refactor handling of idna domain names #55

Conversation

davidklaftenegger commented May 29, 2021

moepman left a comment

Choose a reason for hiding this comment

moepman commented May 29, 2021

davidklaftenegger commented May 29, 2021

Kishi85 commented May 30, 2021

davidklaftenegger commented May 30, 2021

moepman commented May 30, 2021

davidklaftenegger commented May 30, 2021

Kishi85 commented May 30, 2021 • edited Loading

Kishi85 commented May 31, 2021

Kishi85 commented Jun 21, 2021 • edited Loading

Kishi85 commented May 30, 2021 •

edited

Loading

Kishi85 commented Jun 21, 2021 •

edited

Loading