Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor handling of idna domain names #55

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

davidklaftenegger
Copy link
Contributor

This series of commits reworks how translation to IDNA domain names is done. It differs in functionality in only two points:
a) the idna module is no longer optional
b) on error, no attempt is made to continue operation.

While a) could be patched away, I don't think it is worth maintaining separate code paths that are fragile.
As for b), I consider the previous behaviour a bug, where it would attempt to use ACME with the non-translated domain name after printing that an error occurred.

Handling for empty lists is not needed, idna_convert always returns a
full list of idna-converted domain names.
The domaintranslations list already contains all the information,
and the mapping was only used to re-construct the list it is constructed
from.
This change makes the use of idna non-optional, and fixes what I
consider a bug: When idna translation fails, an error was printed, but
operation would still continue. The new version instead re-raises the
exception to force the program to abort at this point.
The original names are already known in order, so there is no need to
return tuples here.
Copy link
Owner

@moepman moepman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would make sense to print both the human readable domain and the idna domain. That should make it easier to see, what is actually requested from the API as well as what a user would normally enter in the config and/or browser.

@moepman
Copy link
Owner

moepman commented May 29, 2021

It seems that either not all code paths have been updaed or there is still a bug. When running the code from this version with an example idna domain I receive the following error:

Generating CSR for ['bläh.example.com'']
Error: Certificate issue/renew failed
       Traceback (most recent call last):
         File "/usr/lib/python3/dist-packages/acertmgr/__init__.py", line 176, in main
           cert_get(config)
         File "/usr/lib/python3/dist-packages/acertmgr/__init__.py", line 58, in cert_get
           cr = tools.new_cert_request(settings['domainlist_idna'], key, must_staple)
         File "/usr/lib/python3/dist-packages/acertmgr/tools.py", line 108, in new_cert_request
           names[0].decode('utf-8') if getattr(names[0], 'decode', None) else
       TypeError: 'map' object is not subscriptable

@davidklaftenegger
Copy link
Contributor Author

Maybe it would make sense to print both the human readable domain and the idna domain. That should make it easier to see, what is actually requested from the API as well as what a user would normally enter in the config and/or browser.

Sounds reasonable, but probably is beyond the scope of this PR. Until such a change is implemented, would you prefer the idna or the human-readable version in the log messages?

@Kishi85
Copy link
Contributor

Kishi85 commented May 30, 2021

idna_convert is currently used as an additional input sanitizer and does only touch domains that really need to be changed to an IDNA pattern (covered by the any(ord(c) > 128 part). Which makes it transparent in all other codepaths.

This PR also starts storing domain names twice, which is (unless it is for performance reasons, which it is obviously not in this use case) dislikable because it introduces additional complexity and can be error-prone if not handled correctly in all (future) changes.

But I agree that the current code might need some additional comments to really understand what is going on.

@davidklaftenegger
Copy link
Contributor Author

idna_convert is currently used as an additional input sanitizer and does only touch domains that really need to be changed to an IDNA pattern (covered by the any(ord(c) > 128 part). Which makes it transparent in all other codepaths.

but it does not sanitize anything?

This PR also starts storing domain names twice, which is (unless it is for performance reasons, which it is obviously not in this use case) dislikable because it introduces additional complexity and can be error-prone if not handled correctly in all (future) changes.

I strongly disagree: the previous code stores the domain names three times (once in the list, twice in the translation returned from idna_convert). I consider this a step towards less complexity.

@moepman
Copy link
Owner

moepman commented May 30, 2021

I think'd I'd prefer the solution @davidklaftenegger proposed to me in private: only translate the domain name where needed. This would also remove the problem of stroing data twice.

With regards to printing/logging: Since this PR is not fixing any bugs I'd rather wait for you (or someone else) to implement a logic similar to this: if the domain needs translation: human_readable_domain_name (translated_domain_name) else: domain_name.

@davidklaftenegger
Copy link
Contributor Author

I think'd I'd prefer the solution @davidklaftenegger proposed to me in private: only translate the domain name where needed. This would also remove the problem of stroing data twice.

This solution appears possible to me, but will require more changes in other places of the code.

@Kishi85
Copy link
Contributor

Kishi85 commented May 30, 2021

idna_convert is currently used as an additional input sanitizer and does only touch domains that really need to be changed to an IDNA pattern (covered by the any(ord(c) > 128 part). Which makes it transparent in all other codepaths.

but it does not sanitize anything?

Sanitizing is probably the wrong word, but it converts unusable unicode (if present) to the proper format for the certificate request

This PR also starts storing domain names twice, which is (unless it is for performance reasons, which it is obviously not in this use case) dislikable because it introduces additional complexity and can be error-prone if not handled correctly in all (future) changes.

I strongly disagree: the previous code stores the domain names three times (once in the list, twice in the translation returned from idna_convert). I consider this a step towards less complexity.

Outside configuration it is only stored once in domainlist, the only place where it is used multiple times is within configuration. A simple proposal for config processing with a single domain list + a map storing the mapped domains only inlcuding the essential changes by @davidklaftenegger in this PR would be to replace lines 92-95 with the follwing and update the parts in configurion of challange handlers using domaintranslation accordingly:

# Convert unicode to IDNA domains
config['domainlist_idnamap'] = {}
    for idx in range(0,len(config['domainlist'])):
        if any(ord(c) >= 128 for c in config['domainlist'][idx]):
            domain_human = config['domainlist'][idx]
            domain_idna = idna_convert(domain_human)
            config['domainlist'][idx] = domain_idna  # Update domain with idna counterpart
            config['domainlist_idnamap'][domain_idna] = domain_human  # Store original domain for reference

The map could be then also used for any log messages with a simple domain if domain not in config['domainlist_idnamap'] else '{} ({}).format(domain, config['domainlist_idnamap'][domain]).

This proposal combined with the current state of the PR would be a good simplification of the currently too complex inda_convert funtion.

EDIT: Had to update this before being able to sleep ;) Sorry for the confusion on the earlier iteration of this comment.

@Kishi85
Copy link
Contributor

Kishi85 commented May 31, 2021

Outside configuration it is only stored once in domainlist, the only place where it is used multiple times is within configuration. A simple proposal for config processing with a single domain list + a map storing the mapped domains only inlcuding the essential changes by @davidklaftenegger in this PR would be to replace lines 92-95 with the follwing and update the parts in configurion of challange handlers using domaintranslation accordingly:

# Convert unicode to IDNA domains
config['domainlist_idnamap'] = {}
    for idx in range(0,len(config['domainlist'])):
        if any(ord(c) >= 128 for c in config['domainlist'][idx]):
            domain_human = config['domainlist'][idx]
            domain_idna = idna_convert(domain_human)
            config['domainlist'][idx] = domain_idna  # Update domain with idna counterpart
            config['domainlist_idnamap'][domain_idna] = domain_human  # Store original domain for reference

For clarification:
config['domainlist'] keeps all domains necessary for processing (IDNA ones will replace the human readble ones in the process) in one place and config['domainlist_idnamap'] contains only the mapped domains with their human readable counterpart. This can be used to fix the challange handler setup as well as the already mentioned log message modifications (where applicable). Challenge handler config could look something like this (Replacement for lines 164-166):

        # Update handler config with more specific values (use original names for translated unicode domains)
        specificcfgs = [x for x in handlerconfigs if 'domain' in x and x['domain'] == config['domainlist_idnamap'].get(domain, domain)]

This way my propsal would confine all changes in this PR to tools.py and configuration.py without changes to other parts of the code. Except for logging things, which would have to be added where necessary. Just trying to keep it simple.

@Kishi85
Copy link
Contributor

Kishi85 commented Jun 21, 2021

A little thing i've noticed in regards to this:

a) the idna module is no longer optional

cryptography-0.6 does not pull in idna, so this will only work if you bump the minimum required cryptography version up to a version that does. I've not had the time to figure out which version this would be. The minium valid version for this change also has to meet our goal of supporting versions used distributions that still have mainstream support.

EDIT: Typos and grammar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants