Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR][Reader] Option to NOT decode text attachments charset #182

Open
Vovan-VE opened this issue Jul 8, 2024 · 0 comments
Open

[FR][Reader] Option to NOT decode text attachments charset #182

Vovan-VE opened this issue Jul 8, 2024 · 0 comments

Comments

@Vovan-VE
Copy link

Vovan-VE commented Jul 8, 2024

Hello!

When a text attachments are presented in a different charset, this code fragment converts all of such attachments.

go-message/entity.go

Lines 48 to 57 in f7e55c4

// RFC 2046 section 4.1.2: charset only applies to text/*
if strings.HasPrefix(mediaType, "text/") {
if ch, ok := mediaParams["charset"]; ok {
if converted, charsetErr := charsetReader(ch, body); charsetErr != nil {
err = UnknownCharsetError{charsetErr}
} else {
body = converted
}
}
}

This is fine for inline parts, but it's not good for attachment parts, because:

  • When text attachment charset decoded successfully, there is a confusion: body is in UTF-8, and charset param is the original charset name still.
  • In case of broken/malformed email generated by sender, here comes data corruption (lots of \uFFFD). This case can be solved only by replacing entire input io.Reader before passing to your Reader.

Contrete example

A broken email generated by 4th-party software having an attachment in form of 3rd-party proprietary format (text/plain; charset=windows-1251 being blamed and damned uncountable number of times in tens years), but wrong charset ANSI_X3.4-1968 is used instead of actual windows-1251:

Content-Type: multipart/mixed; boundary=qqq

--qqq
Content-Type: text/plain; charset=UTF-8

Bla-bla-bla
--qqq
Content-Type: text/plain; charset=ANSI_X3.4-1968; name=123.txt
Content-Disposition: attachment; filename=123.txt

Actually content is in windows-1251 and will be corrupted by mail Reader.
--qqq--

Feature Request

Please, introduce an option in any form (for example package exported variable) to disable charset conversion for attachment text parts. Keep the option default state in backward compatible value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant