Avoid unnecessary decompression from buffer #72

khuongduybui · 2023-11-28T16:15:44Z

Problem

I use compress gzip buffer (built-in, no plugin) and compress_request true with this http output plugin.
Fluentd attempts to gunzip the buffer from disk, which is then recompressed by this plugin.

Steps to replicate

# Upload configuration for Syslog events
<match syslog.events>
  @type http
  endpoint_url "https://<redacted>"
  http_method post

# send compressed events
  compress_request true
  serializer json
  buffered true
  bulk_request true
# specify recoverable/repeatable status codes
  recoverable_status_codes 404, 500, 502, 503, 504

  # every 5 minutes or every 10 MBs
  <buffer tag,time>
    @type file
    path /shared/logs_5/buffer/syslog
    timekey 5m
    timekey_wait 0m
    timekey_use_utc true
    chunk_limit_size 10MB
    compress gzip
    total_limit_size 50GB
    overflow_action drop_oldest_chunk
    retry_timeout 7d
    retry_max_interval 3600
  </buffer>
  <format>
    @type json
    add_newline true
  </format>
</match>

Expected Behavior or What you need to ask

According to Fluentd doc https://docs.fluentd.org/configuration/buffer-section#:~:text=Fluentd%20will%20decompress,plugin%20as%20is):

Fluentd will decompress these compressed chunks automatically before passing them to the output plugin (The exceptional case is when the output plugin can transfer data in compressed form. In this case, the data will be passed to the plugin as is).

Can we somehow let fluentd know that this output plugin can transfer data in compressed form and skip the decomp / re-comp?

The main reason why we came to this revelation is due to fluentd having errors sometimes when decompressing the gzip'ed buffer chunks and choke on it with the same up-to-1-week retry logic that we put in place for cases like network loss. We'd rather fluentd pass the bad chunks to this plugin, which sends them as-is to my endpoint in the cloud, where we have all the processing power to attempt to recover them or discard them without choking up the pipe.

Using Fluentd and out_http plugin versions

OS version: Debian 11
Bear Metal or Within Docker or Kubernetes or other: official Docker image
Fluentd version: 1.16.1
out_http plugin 1.3.4

abbrev (default: 0.1.0)
async (1.31.0)
async-http (0.60.1)
async-io (1.34.3)
async-pool (0.4.0)
base64 (default: 0.1.1)
benchmark (default: 0.2.0)
bigdecimal (default: 3.1.1)
bson (4.15.0)
bundler (default: 2.3.26)
cgi (default: 0.3.6)
concurrent-ruby (1.2.2)
console (1.16.2)
cool.io (1.7.1)
csv (default: 3.2.5)
date (default: 3.2.2)
debug (1.6.3)
delegate (default: 0.2.0)
did_you_mean (default: 1.6.1)
digest (default: 3.1.0)
drb (default: 2.1.0)
english (default: 0.7.1)
erb (default: 2.2.3)
error_highlight (default: 0.3.0)
etc (default: 1.3.0)
fcntl (default: 1.0.1)
fiber-local (1.0.0)
fiddle (default: 1.1.0)
fileutils (default: 1.6.0)
find (default: 0.1.1)
fluent-config-regexp-type (1.0.0)
fluent-plugin-mongo (1.6.0)
fluent-plugin-multi-format-parser (1.0.0)
fluent-plugin-out-http (1.3.4)
fluent-plugin-prometheus (2.1.0)
fluent-plugin-rewrite-tag-filter (2.4.0)
fluentd (1.16.1)
forwardable (default: 1.3.2)
getoptlong (default: 0.1.1)
http_parser.rb (0.8.0)
io-console (default: 0.5.11)
io-nonblock (default: 0.1.0)
io-wait (default: 0.2.1)
ipaddr (default: 1.2.4)
irb (default: 1.4.1)
json (2.6.3, default: 2.6.1)
logger (default: 1.5.0)
matrix (0.4.2)
minitest (5.15.0)
mongo (2.18.3)
msgpack (1.7.0)
mutex_m (default: 0.1.1)
net-ftp (0.1.3)
net-http (default: 0.3.0)
net-imap (0.2.3)
net-pop (0.1.1)
net-protocol (default: 0.1.2)
net-smtp (0.3.1)
nio4r (2.5.9)
nkf (default: 0.1.1)
observer (default: 0.1.1)
oj (3.14.3)
open-uri (default: 0.2.0)
open3 (default: 0.1.1)
openssl (default: 3.0.1)
optparse (default: 0.2.0)
ostruct (default: 0.5.2)
pathname (default: 0.2.0)
power_assert (2.0.1)
pp (default: 0.3.0)
prettyprint (default: 0.1.1)
prime (0.1.2)
prometheus-client (4.2.2)
protocol-hpack (1.4.2)
protocol-http (0.24.1)
protocol-http1 (0.15.0)
protocol-http2 (0.15.1)
pstore (default: 0.1.1)
psych (default: 4.0.4)
racc (default: 1.6.0)
rake (13.0.6)
rbs (2.7.0)
rdoc (default: 6.4.0)
readline (default: 0.0.3)
readline-ext (default: 0.1.4)
reline (default: 0.3.1)
resolv (default: 0.2.1)
resolv-replace (default: 0.1.0)
rexml (3.2.5)
rinda (default: 0.1.1)
rss (0.2.9)
ruby2_keywords (default: 0.0.5)
securerandom (default: 0.2.0)
serverengine (2.3.2)
set (default: 1.0.2)
shellwords (default: 0.1.0)
sigdump (0.2.4)
singleton (default: 0.1.1)
stringio (default: 3.0.1)
strptime (0.2.5)
strscan (default: 3.0.1)
syslog (default: 0.1.0)
tempfile (default: 0.1.2)
test-unit (3.5.3)
time (default: 0.2.2)
timeout (default: 0.2.0)
timers (4.3.5)
tmpdir (default: 0.1.2)
traces (0.9.1)
tsort (default: 0.1.0)
typeprof (0.21.3)
tzinfo (2.0.6)
tzinfo-data (1.2023.3)
un (default: 0.2.0)
uri (default: 0.12.1)
weakref (default: 0.1.1)
webrick (1.8.1)
yajl-ruby (1.4.3)
yaml (default: 0.2.0)
zlib (default: 2.1.1)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid unnecessary decompression from buffer #72

Avoid unnecessary decompression from buffer #72

khuongduybui commented Nov 28, 2023

Avoid unnecessary decompression from buffer #72

Avoid unnecessary decompression from buffer #72

Comments

khuongduybui commented Nov 28, 2023

Problem

Steps to replicate

Expected Behavior or What you need to ask

Using Fluentd and out_http plugin versions