Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.2.0 not working after update from 3.1.10 #9593

Open
USBAkimbo opened this issue Nov 14, 2024 · 5 comments
Open

3.2.0 not working after update from 3.1.10 #9593

USBAkimbo opened this issue Nov 14, 2024 · 5 comments

Comments

@USBAkimbo
Copy link

Bug Report

Describe the bug

  • I have VMs running fluent-bit and they were on version 3.1.10
  • Yesterday at 17:10 UTC, I ran my Ansible playbook which includes updating fluent-bit using the apt module
  • This upgraded the agents to 3.2.0, but then I noticed this in my logs
Nov 14 09:16:27 h-we-vm-01 fluent-bit[311004]: [2024/11/14 09:16:27] [error] [engine] chunk '311004-1731575775.597403200.flb' cannot be retried: task_id=8, input=tail.0 > output=http.0
Nov 14 09:16:28 h-we-vm-01 fluent-bit[311004]: [2024/11/14 09:16:28] [error] [tls] error: unexpected EOF
Nov 14 09:16:28 h-we-vm-01 fluent-bit[311004]: [2024/11/14 09:16:28] [error] [output:http:http.0] no upstream connections available to my.seq.server:443
Nov 14 09:16:28 h-we-vm-01 fluent-bit[311004]: [2024/11/14 09:16:28] [error] [tls] error: unexpected EOF
Nov 14 09:16:28 h-we-vm-01 fluent-bit[311004]: [2024/11/14 09:16:28] [error] [output:http:http.0] no upstream connections available to my.seq.server:443
Nov 14 09:16:28 h-we-vm-01 fluent-bit[311004]: [2024/11/14 09:16:28] [error] [tls] error: unexpected EOF
Nov 14 09:16:28 h-we-vm-01 fluent-bit[311004]: [2024/11/14 09:16:28] [error] [output:http:http.0] no upstream connections available to my.seq.server:443
Nov 14 09:16:28 h-we-vm-01 fluent-bit[311004]: [2024/11/14 09:16:28] [error] [engine] chunk '311004-1731575776.597385227.flb' cannot be retried: task_id=2, input=tail.0 > output=http.0
Nov 14 09:16:28 h-we-vm-01 fluent-bit[311004]: [2024/11/14 09:16:28] [error] [engine] chunk '311004-1731575778.597509586.flb' cannot be retried: task_id=10, input=tail.0 > output=http.0
Nov 14 09:16:28 h-we-vm-01 fluent-bit[311004]: [2024/11/14 09:16:28] [ warn] [engine] failed to flush chunk '311004-1731575787.969547933.flb', retry in 10 seconds: task_id=7, input=tail.0 > output=http.0 (out>
  • This implies that something is wrong with my server or cert, when it was working fine
  • Uninstalling and downgrading to 3.1.10 fixes this issue

To Reproduce

  • Use this fluent-bit.conf with Seq (this is jinja formatted so please adjust the server URL and API key accordingly)
[INPUT]
    Name                    tail
    Parser                  simple
    Path                    /var/log/*.log, /var/log/*/*.log
    Path_Key                file_path

[FILTER]
    Name                    modify
    Match                   *
    Rename                  log @m
    Add                     hostname ${HOSTNAME}

[OUTPUT]
    Name                    http
    Match                   *
    Host                    {{ SEQ_SERVER_URL }}
    Port                    443
    TLS                     On
    URI                     ingest/clef
    Header                  X-Seq-ApiKey {{ SEQ_API_KEY }}
    Format                  json_lines
    Json_date_key           @t
    Json_date_format        iso8601
    Log_response_payload    False
  • And use this parsers.conf
[PARSER]
    Name            simple
    Format          regex
    Regex           ^(?<time>[^ ]+) (?<message>.+)$
    Time_Key        time
    Time_Format     %Y-%m-%dT%H:%M:%S.%L%z
  • Install the agent version 3.1.10 and use the above config on an Ubuntu 22.04 system
  • Logs should flow
  • Now update to 3.2.0
  • Logs will stop flowing

Expected behavior

  • Logs being sent to my Seq log server

Screenshots

  • N/A

Your Environment

  • Version used: 3.1.0 and 3.2.0
  • Configuration: See above
  • Environment name and version (e.g. Kubernetes? What version?): Ubuntu 22.04 running on Azure VMs
  • Server type and version: D4as Azur VM
  • Operating System and version: Ubuntu 22.04
  • Filters and plugins: None, see above config as that's the only config that's used

Additional context

  • I notice no releases in GitHub for 3.2.0 but there is a manifest on https://packages.fluentbit.io/3.2.0/
  • I got very unlucky here - it looks like 3.2.0 was released at 17:00 which was 10 mins before my Ansible run
@patrick-stephens
Copy link
Contributor

To help debugging can you clarify if this is a self-signed cert, where/how the specific cert is installed for this server and anything else that may be relevant around the actual TLS/SSL config?

@USBAkimbo
Copy link
Author

USBAkimbo commented Nov 14, 2024

The cert is a LetsEncrypt cert that's valid until 2025-02

Cert is installed on the load balancer that fluent-bit is hitting

If I do a curl I get a "cert is valid" response

The cert also has the full chain - the TL;DR is I'm using LEGO ACME to get my cert using a DNS challenge

That then outputs a full chain PFX and that's the cert that I use - all other systems have no problem with it and neither does fluent-bit on a 3.1.x build

@patrick-stephens
Copy link
Contributor

Can you check the ssl library deps as well between the two versions? Wondering if it is related to that too, I can probably do it but may not be the exact same as your ones and quicker if you do :)

@patrick-stephens
Copy link
Contributor

You could also try tls.debug 4 to see if it gives anymore details as to what/why it is failing and passing: https://docs.fluentbit.io/manual/administration/transport-security

@pierrebeaucamp
Copy link

We're seeing the same behaviour while trying to output to Datadog fyi (i.e. same [tls] error: unexpected EOF). Downgrading to 3.1.10 fixes this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants