Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in handling charsets different from UTF-8 #132

Open
andsel opened this issue Nov 30, 2020 · 2 comments
Open

Error in handling charsets different from UTF-8 #132

andsel opened this issue Nov 30, 2020 · 2 comments

Comments

@andsel
Copy link
Contributor

andsel commented Nov 30, 2020

  • Version: 3.3.5
  • Operating System:
  • Config File (if you have sensitive info, please remove it):
input {
	http {
		port => 9006
		codec => plain {
			charset => "CP1254"
		}
	}
}	

output {
	stdout {
		codec => json {charset => "UTF-8"}
	}
}
  • Sample Data:
    python script to use as client to send encoded data
import requests
API_ENDPOINT = "http://127.0.0.1:9006"
message='TÜRKÇE karakter test : ĞÜŞİÇÖışüğöç'
r = requests.post(url = API_ENDPOINT, data = bytes(message,'cp1254'))
  • Steps to Reproduce:
    • run logstash with the pipeline
    • execute the python script
    • the console output is:
{"message":"T�RK�E karakter test : ������������","@version":"1","@timestamp":"2020-11-30T10:38:55.338Z","headers":{"connection":"keep-alive","request_method":"POST","http_accept":"*/*","http_user_agent":"python-requests/2.21.0","content_length":"35","http_version":"HTTP/1.1","http_host":"127.0.0.1:9006","request_path":"/","accept_encoding":"gzip, deflate"},"host":"127.0.0.1"}

This seems not to be a problem in the codec because I've tried with this pipeline (same codec, different input):

input {
	file {
		path => "/tmp/cp1254_encoded.txt"
		mode => "read"
		sincedb_path => "/dev/null"
		file_completed_log_path => "/tmp/file_actions.log"
		file_completed_action => "log"
		codec => plain {
			charset => "CP1254"
		}
	}
}	

output {
	stdout {
		codec => json {charset => "UTF-8"}
	}
}

with the file attached as input data
cp1254_encoded.txt

and the console out is what's expected (TÜRKÇE karakter test : ĞÜŞİÇÖışüğöç)

NB:
to reproduce the text file simply cut&paste the above string in a text editor and ask the editor to save it with encoding CP1254

@GokcerBelgusen
Copy link

Hi guys, any improvement about this issue ?

@andsel
Copy link
Contributor Author

andsel commented May 3, 2021

Hi @GokcerBelgusen actually no news on this, but I'll keep track in my radar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants