Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(kubernetes_logs source): vector 0.44 seems that it fails to watch some pods or log change #22235

Open
titaneric opened this issue Jan 17, 2025 · 3 comments
Labels
meta: regression This issue represents a regression source: kubernetes_logs Anything `kubernetes_logs` source related type: bug A code related bug.

Comments

@titaneric
Copy link
Contributor

titaneric commented Jan 17, 2025

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

Hi, I had updated the vector to 0.44 in our cluster, and I noticed that it failed to watch some pod's log file change.

Original log event

{"file":"/var/log/pods/loki_loki-chunks-cache-1_efec6f73-8c2c-46c8-bb07-4c03cd3561a9/memcached/0.log","kubernetes":{"container_id":"containerd://65be5c8aa5e706248b0150adb1a67e71f471f474215b2369f184fb55a7f81a6e","container_image":"memcached:1.6.32-alpine","container_image_id":"docker.io/library/memcached@sha256:0a27d9d5084fc5781dc614a0223010c65bbab748038603ea04031fbcdcd97185","container_name":"memcached","namespace_labels":{"kubernetes.io/metadata.name":"loki"},"node_labels":{},"pod_ip":"172.16.165.161","pod_ips":["172.16.165.161"],"pod_labels":{"app.kubernetes.io/component":"memcached-chunks-cache","app.kubernetes.io/instance":"loki","app.kubernetes.io/name":"loki","controller-revision-hash":"loki-chunks-cache-6d95f8d4cb","name":"memcached-chunks-cache","statefulset.kubernetes.io/pod-name":"loki-chunks-cache-1"},"pod_name":"loki-chunks-cache-1","pod_namespace":"loki","pod_node_name":"my_node_name","pod_owner":"StatefulSet/loki-chunks-cache","pod_uid":"efec6f73-8c2c-46c8-bb07-4c03cd3561a9"},"message":"<138 gets tenant/9e0726d8fffb773c/194750fde24:19475108ec6:eea0345","source_type":"kubernetes_logs","stream":"stderr","timestamp":"2025-01-17T16:20:31.606667560Z"}
{"file":"/var/log/pods/loki_loki-chunks-cache-1_efec6f73-8c2c-46c8-bb07-4c03cd3561a9/memcached/0.log","kubernetes":{"container_id":"containerd://65be5c8aa5e706248b0150adb1a67e71f471f474215b2369f184fb55a7f81a6e","container_image":"memcached:1.6.32-alpine","container_image_id":"docker.io/library/memcached@sha256:0a27d9d5084fc5781dc614a0223010c65bbab748038603ea04031fbcdcd97185","container_name":"memcached","namespace_labels":{},"node_labels":{},"pod_ip":"172.16.165.161","pod_ips":["172.16.165.161"],"pod_labels":{"app.kubernetes.io/component":"memcached-chunks-cache","app.kubernetes.io/instance":"loki","app.kubernetes.io/name":"loki","controller-revision-hash":"loki-chunks-cache-6d95f8d4cb","name":"memcached-chunks-cache","statefulset.kubernetes.io/pod-name":"loki-chunks-cache-1"},"pod_name":"loki-chunks-cache-1","pod_namespace":"loki","pod_node_name":"my_node_name","pod_owner":"StatefulSet/loki-chunks-cache","pod_uid":"efec6f73-8c2c-46c8-bb07-4c03cd3561a9"},"message":">138 sending key tenant/9e0726d8fffb773c/194750fde24:19475108ec6:eea0345","source_type":"kubernetes_logs","stream":"stderr","timestamp":"2025-01-17T16:20:31.606726881Z"}
{"file":"/var/log/pods/loki_loki-chunks-cache-1_efec6f73-8c2c-46c8-bb07-4c03cd3561a9/memcached/0.log","kubernetes":{"container_id":"containerd://65be5c8aa5e706248b0150adb1a67e71f471f474215b2369f184fb55a7f81a6e","container_image":"memcached:1.6.32-alpine","container_image_id":"docker.io/library/memcached@sha256:0a27d9d5084fc5781dc614a0223010c65bbab748038603ea04031fbcdcd97185","container_name":"memcached","namespace_labels":{"kubernetes.io/metadata.name":"loki"},"node_labels":{},"pod_ip":"172.16.165.161","pod_ips":["172.16.165.161"],"pod_labels":{"app.kubernetes.io/component":"memcached-chunks-cache","app.kubernetes.io/instance":"loki","app.kubernetes.io/name":"loki","controller-revision-hash":"loki-chunks-cache-6d95f8d4cb","name":"memcached-chunks-cache","statefulset.kubernetes.io/pod-name":"loki-chunks-cache-1"},"pod_name":"loki-chunks-cache-1","pod_namespace":"loki","pod_node_name":"my_node_name","pod_owner":"StatefulSet/loki-chunks-cache","pod_uid":"efec6f73-8c2c-46c8-bb07-4c03cd3561a9"},"message":">138 END","source_type":"kubernetes_logs","stream":"stderr","timestamp":"2025-01-17T16:20:31.606729580Z"}

Failed log event

{"file":"/var/log/pods/logsender-fluentd-dqqgt_b1e0f28e-a1bf-4807-92d3-0149ddd1372b/fluentd/0.log","kubernetes":{"container_id":"containerd://d84ea604c2cc92def411adac4b796c4ab54c4ae9be0c6d31edd5a83b9199aebc","container_image":"fluentd-kubernetes-daemon
set-line:0.0.8","container_image_id":"fluentd-kubernetes-daemonset-line@sha256:ecd57ee76e99b87301549c833ec5e11e86b33976b96888013ca41ee383bde601","container_name":"fluentd","namespace_labels":{},"node_labels":{},"pod_annotations":{"restart-hash":"731fd5e2f6366d23a3e0cf2429b4f95b"},"pod_ip":"172.16.
165.54","pod_ips":["172.16.165.54"],"pod_labels":{"app":"logsender-fluentd","controller-revision-hash":"5944f684bb","pod-template-generation":"8"},"pod_name":"logsender-fluentd-dqqgt","pod_namespace":"my_namespace","pod_node_name":"my_node_name","pod_owner":"DaemonSet/logsender-f
luentd","pod_uid":"b1e0f28e-a1bf-4807-92d3-0149ddd1372b"},"message":"2025-01-17 16:27:05 +0000 [warn]: #0 no patterns matched tag=\"k8sns.loki\"","source_type":"kubernetes_logs","stream":"stdout","timestamp":"2025-01-17T16:27:05.615941151Z"}
{"file":"/var/log/pods/logsender-fluentd-dqqgt_b1e0f28e-a1bf-4807-92d3-0149ddd1372b/fluentd/0.log","kubernetes":{"container_id":"containerd://d84ea604c2cc92def411adac4b796c4ab54c4ae9be0c6d31edd5a83b9199aebc","container_image":"fluentd-kubernetes-daemon
set-line:0.0.8","container_image_id":"fluentd-kubernetes-daemonset-line@sha256:ecd57ee76e99b87301549c833ec5e11e86b33976b96888013ca41ee383bde601","container_name":"fluentd","namespace_labels":{},"node_labels":{},"pod_annotations":{"restart-hash":"731fd5e2f6366d23a3e0cf2429b4f95b"},"pod_ip":"172.16.
165.54","pod_ips":["172.16.165.54"],"pod_labels":{"app":"logsender-fluentd","controller-revision-hash":"5944f684bb","pod-template-generation":"8"},"pod_name":"logsender-fluentd-dqqgt","pod_namespace":"my_namespace","pod_node_name":"my_node_name","pod_owner":"DaemonSet/logsender-f
luentd","pod_uid":"b1e0f28e-a1bf-4807-92d3-0149ddd1372b"},"message":"2025-01-17 16:27:11 +0000 [info]: #0 Timeout flush: k8s.var.log.containers.logsender-fluentd-for-admin-cv985_my_namespace_fluentd-45e373168d574e11b66206103f03e7add92ae5b3f4d80a0529b411fff1fde191.log:","source_type":"kubernetes
_logs","stream":"stdout","timestamp":"2025-01-17T16:27:11.514884165Z"}
{"file":"/var/log/pods/logsender-fluentd-dqqgt_b1e0f28e-a1bf-4807-92d3-0149ddd1372b/fluentd/0.log","kubernetes":{"container_id":"containerd://d84ea604c2cc92def411adac4b796c4ab54c4ae9be0c6d31edd5a83b9199aebc","container_image":"fluentd-kubernetes-daemon
set-line:0.0.8","container_image_id":"fluentd-kubernetes-daemonset-line@sha256:ecd57ee76e99b87301549c833ec5e11e86b33976b96888013ca41ee383bde601","container_name":"fluentd","namespace_labels":{},"node_labels":{},"pod_annotations":{"restart-hash":"731fd5e2f6366d23a3e0cf2429b4f95b"},"pod_ip":"172.16.
165.54","pod_ips":["172.16.165.54"],"pod_labels":{"app":"logsender-fluentd","controller-revision-hash":"5944f684bb","pod-template-generation":"8"},"pod_name":"logsender-fluentd-dqqgt","pod_namespace":"my_namespace","pod_node_name":"my_node_name","pod_owner":"DaemonSet/logsender-f
luentd","pod_uid":"b1e0f28e-a1bf-4807-92d3-0149ddd1372b"},"message":"2025-01-17 16:27:12 +0000 [info]: #0 Timeout flush: k8s.var.log.containers.exporters-node-cvsfd_my_namespace_node-collector-36248918727b55ac5abbfda8c5139f0962954b8a962eeff49aa0dc57ef190770.log:","source_type":"kubernetes_logs"
,"stream":"stdout","timestamp":"2025-01-17T16:27:12.514872008Z"}

Notice that it does detect some files, but it only detect very little logs instead ALL logs.
I am still trying to figure out what's the difference between these logs, and I will update this issue once I find out it.

Our initial vector's version is 0.42.0, and I also try 0.43.1 and it works fine.

Configuration

data_dir: "${VECTOR_DATA_DIR}"
api:
  enabled: true
  graphql: true
  playground: true
  address: "0.0.0.0:8686"
sources:
  kubernetes_logs:
    type: kubernetes_logs

sinks:
  blackhole:
    type: blackhole
    inputs:
    - kubernetes_logs

Version

0.44.0-distroless-libc

Debug Output

2025-01-17T16:25:25.930178Z DEBUG vector::app: Internal log rate limit configured. internal_log_rate_secs=10
2025-01-17T16:25:25.930361Z  INFO vector::app: Log level is enabled. level="debug"
2025-01-17T16:25:25.930544Z DEBUG vector::app: messaged="Building runtime." worker_threads=1
2025-01-17T16:25:25.931396Z  INFO vector::app: Loading configs. paths=["/etc/vector"]
2025-01-17T16:25:25.933854Z DEBUG vector::config::loading: No secret placeholder found, skipping secret resolution.
2025-01-17T16:25:25.935866Z DEBUG vector::topology::builder: Building new source. component=kubernetes_logs
2025-01-17T16:25:25.936477Z  INFO source{component_kind="source" component_id=kubernetes_logs component_type=kubernetes_logs}: vector::sources::kubernetes_logs: Obtained Kubernetes Node name to collect logs for (self). self_node_name="my_node_name"
2025-01-17T16:25:25.950884Z  INFO source{component_kind="source" component_id=kubernetes_logs component_type=kubernetes_logs}: vector::sources::kubernetes_logs: Including matching files. ret=["**/*"]
2025-01-17T16:25:25.950918Z  INFO source{component_kind="source" component_id=kubernetes_logs component_type=kubernetes_logs}: vector::sources::kubernetes_logs: Excluding matching files. ret=["**/*.gz", "**/*.tmp"]
2025-01-17T16:25:25.950997Z DEBUG vector::topology::builder: Building new sink. component=blackhole
2025-01-17T16:25:25.951568Z  INFO vector::topology::running: Running healthchecks.
2025-01-17T16:25:25.951607Z DEBUG vector::topology::running: Connecting changed/added component(s).
2025-01-17T16:25:25.951640Z DEBUG vector::topology::running: Configuring outputs for source. component=kubernetes_logs
2025-01-17T16:25:25.951710Z DEBUG vector::topology::running: Configuring output for component. component=kubernetes_logs output_id=None
2025-01-17T16:25:25.951728Z DEBUG vector::topology::running: Connecting inputs for sink. component=blackhole
2025-01-17T16:25:25.951761Z DEBUG vector::topology::running: Adding component input to fanout. component=blackhole fanout_id=kubernetes_logs
2025-01-17T16:25:25.951794Z  INFO vector::topology::builder: Healthcheck passed.
2025-01-17T16:25:25.951857Z DEBUG vector::topology::running: Spawning new source. key=kubernetes_logs
2025-01-17T16:25:25.951964Z DEBUG source{component_kind="source" component_id=kubernetes_logs component_type=kubernetes_logs}: vector::topology::builder: Source pump supervisor starting.
2025-01-17T16:25:25.951973Z  INFO vector: Vector has started. debug="false" version="0.44.0" arch="x86_64" revision="3cdc7c3 2025-01-13 21:26:04.735691656"
2025-01-17T16:25:25.952014Z DEBUG source{component_kind="source" component_id=kubernetes_logs component_type=kubernetes_logs}: vector::topology::builder: Source pump starting.
2025-01-17T16:25:25.952035Z DEBUG source{component_kind="source" component_id=kubernetes_logs component_type=kubernetes_logs}: vector::topology::builder: Source starting.
2025-01-17T16:25:25.953103Z DEBUG tower::buffer::worker: service.ready=true processing request
2025-01-17T16:25:25.953325Z DEBUG HTTP{http.method=GET http.url=https://172.30.0.1/api/v1/nodes?&fieldSelector=metadata.name%3Dmy_node_name&limit=500 otel.name="list" otel.kind="client"}: kube_client::client::builder: requesting
2025-01-17T16:25:25.953545Z DEBUG HTTP{http.method=GET http.url=https://172.30.0.1/api/v1/nodes?&fieldSelector=metadata.name%3Dmy_node_name&limit=500 otel.name="list" otel.kind="client"}: hyper_util::client::legacy::connect::http: connecting to 172.30.0.1:443
2025-01-17T16:25:25.953738Z  INFO source{component_kind="source" component_id=kubernetes_logs component_type=kubernetes_logs}:file_server: file_source::checkpointer: Loaded checkpoint data.
2025-01-17T16:25:25.953915Z DEBUG sink{component_kind="sink" component_id=blackhole component_type=blackhole}: vector::topology::builder: Sink starting.
2025-01-17T16:25:25.954251Z DEBUG tower::buffer::worker: service.ready=true processing request
2025-01-17T16:25:25.954285Z DEBUG HTTP{http.method=GET http.url=https://172.30.0.1/api/v1/pods?&fieldSelector=spec.nodeName%3Dmy_node_name&labelSelector=vector.dev%2Fexclude%21%3Dtrue&limit=500 otel.name="list" otel.kind="client"}: kube_client::client::builder: requesting
2025-01-17T16:25:25.954315Z DEBUG HTTP{http.method=GET http.url=https://172.30.0.1/api/v1/pods?&fieldSelector=spec.nodeName%3Dmy_node_name&labelSelector=vector.dev%2Fexclude%21%3Dtrue&limit=500 otel.name="list" otel.kind="client"}: hyper_util::client::legacy::connect::http: connecting to 172.30.0.1:443
2025-01-17T16:25:25.954486Z DEBUG tower::buffer::worker: service.ready=true processing request
2025-01-17T16:25:25.954504Z DEBUG HTTP{http.method=GET http.url=https://172.30.0.1/api/v1/namespaces?&labelSelector=vector.dev%2Fexclude%21%3Dtrue&limit=500 otel.name="list" otel.kind="client"}: kube_client::client::builder: requesting
2025-01-17T16:25:25.954519Z DEBUG HTTP{http.method=GET http.url=https://172.30.0.1/api/v1/namespaces?&labelSelector=vector.dev%2Fexclude%21%3Dtrue&limit=500 otel.name="list" otel.kind="client"}: hyper_util::client::legacy::connect::http: connecting to 172.30.0.1:443
2025-01-17T16:25:25.954827Z DEBUG sink{component_kind="sink" component_id=blackhole component_type=blackhole}: vector::utilization: utilization=0.075795747539674
2025-01-17T16:25:25.954893Z DEBUG HTTP{http.method=GET http.url=https://172.30.0.1/api/v1/nodes?&fieldSelector=metadata.name%3Dmy_node_name&limit=500 otel.name="list" otel.kind="client"}: hyper_util::client::legacy::connect::http: connected to 172.30.0.1:443
2025-01-17T16:25:25.955198Z  INFO vector::internal_events::api: API server running. address=0.0.0.0:8686 playground=http://0.0.0.0:8686/playground graphql=http://0.0.0.0:8686/graphql
2025-01-17T16:25:25.955613Z DEBUG HTTP{http.method=GET http.url=https://172.30.0.1/api/v1/pods?&fieldSelector=spec.nodeName%3Dmy_node_name&labelSelector=vector.dev%2Fexclude%21%3Dtrue&limit=500 otel.name="list" otel.kind="client"}: hyper_util::client::legacy::connect::http: connected to 172.30.0.1:443
2025-01-17T16:25:25.956033Z DEBUG HTTP{http.method=GET http.url=https://172.30.0.1/api/v1/namespaces?&labelSelector=vector.dev%2Fexclude%21%3Dtrue&limit=500 otel.name="list" otel.kind="client"}: hyper_util::client::legacy::connect::http: connected to 172.30.0.1:443
2025-01-17T16:25:25.964193Z DEBUG hyper_util::client::legacy::pool: pooling idle connection for ("https", 172.30.0.1)
2025-01-17T16:25:25.965070Z DEBUG tower::buffer::worker: service.ready=true processing request
2025-01-17T16:25:25.965121Z DEBUG HTTP{http.method=GET http.url=https://172.30.0.1/api/v1/nodes?&watch=true&timeoutSeconds=290&fieldSelector=metadata.name%3Dmy_node_name&allowWatchBookmarks=true&resourceVersion=4307522316 otel.name="watch" otel.kind="client"}: kube_client::client::builder: requesting
2025-01-17T16:25:25.965137Z DEBUG HTTP{http.method=GET http.url=https://172.30.0.1/api/v1/nodes?&watch=true&timeoutSeconds=290&fieldSelector=metadata.name%3Dmy_node_name&allowWatchBookmarks=true&resourceVersion=4307522316 otel.name="watch" otel.kind="client"}: hyper_util::client::legacy::pool: reuse idle connection for ("https", 172.30.0.1)
2025-01-17T16:25:25.968857Z DEBUG hyper_util::client::legacy::pool: pooling idle connection for ("https", 172.30.0.1)
2025-01-17T16:25:25.969656Z DEBUG tower::buffer::worker: service.ready=true processing request
2025-01-17T16:25:25.969710Z DEBUG HTTP{http.method=GET http.url=https://172.30.0.1/api/v1/namespaces?&watch=true&timeoutSeconds=290&labelSelector=vector.dev%2Fexclude%21%3Dtrue&allowWatchBookmarks=true&resourceVersion=4307522316 otel.name="watch" otel.kind="client"}: kube_client::client::builder: requesting
2025-01-17T16:25:25.969728Z DEBUG HTTP{http.method=GET http.url=https://172.30.0.1/api/v1/namespaces?&watch=true&timeoutSeconds=290&labelSelector=vector.dev%2Fexclude%21%3Dtrue&allowWatchBookmarks=true&resourceVersion=4307522316 otel.name="watch" otel.kind="client"}: hyper_util::client::legacy::pool: reuse idle connection for ("https", 172.30.0.1)
2025-01-17T16:25:26.044238Z DEBUG hyper_util::client::legacy::pool: pooling idle connection for ("https", 172.30.0.1)
2025-01-17T16:25:26.046553Z DEBUG tower::buffer::worker: service.ready=true processing request
2025-01-17T16:25:26.046627Z DEBUG HTTP{http.method=GET http.url=https://172.30.0.1/api/v1/pods?&watch=true&timeoutSeconds=290&fieldSelector=spec.nodeName%3Dmy_node_name&labelSelector=vector.dev%2Fexclude%21%3Dtrue&allowWatchBookmarks=true&resourceVersion=4307522316 otel.name="watch" otel.kind="client"}: kube_client::client::builder: requesting
2025-01-17T16:25:26.046675Z DEBUG HTTP{http.method=GET http.url=https://172.30.0.1/api/v1/pods?&watch=true&timeoutSeconds=290&fieldSelector=spec.nodeName%3Dmy_node_name&labelSelector=vector.dev%2Fexclude%21%3Dtrue&allowWatchBookmarks=true&resourceVersion=4307522316 otel.name="watch" otel.kind="client"}: hyper_util::client::legacy::pool: reuse idle connection for ("https", 172.30.0.1)
2025-01-17T16:25:30.955535Z DEBUG sink{component_kind="sink" component_id=blackhole component_type=blackhole}: vector::utilization: utilization=0.007579599950353075
2025-01-17T16:25:35.955310Z DEBUG sink{component_kind="sink" component_id=blackhole component_type=blackhole}: vector::utilization: utilization=0.0007580334383308525
2025-01-17T16:25:40.955671Z DEBUG sink{component_kind="sink" component_id=blackhole component_type=blackhole}: vector::utilization: utilization=0.00007582494227452057
2025-01-17T16:25:45.955268Z DEBUG sink{component_kind="sink" component_id=blackhole component_type=blackhole}: vector::utilization: utilization=0.00000760337591003513
2025-01-17T16:25:50.954944Z DEBUG sink{component_kind="sink" component_id=blackhole component_type=blackhole}: vector::utilization: utilization=0.0000007866192893211556
2025-01-17T16:25:55.955432Z DEBUG sink{component_kind="sink" component_id=blackhole component_type=blackhole}: vector::utilization: utilization=0.00000010511934688859802
2025-01-17T16:26:00.955375Z DEBUG sink{component_kind="sink" component_id=blackhole component_type=blackhole}: vector::utilization: utilization=0.00000003355224414918023
2025-01-17T16:26:05.955718Z DEBUG sink{component_kind="sink" component_id=blackhole component_type=blackhole}: vector::utilization: utilization=0.00000004241248394149393
2025-01-17T16:26:10.339291Z DEBUG hyper::proto::h1::io: parsed 4 headers
2025-01-17T16:26:10.339338Z DEBUG hyper::proto::h1::conn: incoming body is empty
2025-01-17T16:26:10.339506Z DEBUG http-request{method=GET path=/health}: vector::internal_events::http: Received HTTP request. internal_log_rate_limit=true
2025-01-17T16:26:10.339964Z DEBUG hyper::proto::h1::io: flushed 119 bytes
2025-01-17T16:26:10.340091Z DEBUG hyper::proto::h1::io: parsed 4 headers
2025-01-17T16:26:10.340096Z DEBUG hyper::proto::h1::conn: incoming body is empty
2025-01-17T16:26:10.340110Z DEBUG http-request{method=GET path=/health}: vector::internal_events::http: Internal log [Received HTTP request.] is being suppressed to avoid flooding.
2025-01-17T16:26:10.340157Z DEBUG hyper::proto::h1::io: flushed 119 bytes
2025-01-17T16:26:10.954840Z DEBUG sink{component_kind="sink" component_id=blackhole component_type=blackhole}: vector::utilization: utilization=0.00000003124598434156066
2025-01-17T16:26:15.955087Z DEBUG sink{component_kind="sink" component_id=blackhole component_type=blackhole}: vector::utilization: utilization=0.000000030123260340297374
2025-01-17T16:26:20.955168Z DEBUG sink{component_kind="sink" component_id=blackhole component_type=blackhole}: vector::utilization: utilization=0.00000005089157050316978
2025-01-17T16:26:25.955476Z DEBUG sink{component_kind="sink" component_id=blackhole component_type=blackhole}: vector::utilization: utilization=0.000000032087486122697736
2025-01-17T16:26:25.958817Z DEBUG vector::internal_events::file::source: Files checkpointed. count=88 duration_ms=1
2025-01-17T16:26:27.446715Z DEBUG source{component_kind="source" component_id=kubernetes_logs component_type=kubernetes_logs}:file_server: file_source::file_server: event_throughput=0.000/sec bytes_throughput=0.000/sec ratios={"discovery": 1.9618614e-7, "other": 1.9410345e-5, "sending": 1.4118052e-5, "sleeping": 0.99996626}
2025-01-17T16:26:27.447508Z  INFO source{component_kind="source" component_id=kubernetes_logs component_type=kubernetes_logs}:file_server: vector::internal_events::file::source: Resuming to watch file. file=/var/log/pods/my_namespace_exporters-docker-exporter-nllxr_b4fe19b2-a44c-4f6e-be66-03ea92f42351/docker-exporter/0.log file_position=793
2025-01-17T16:26:27.447614Z  WARN source{component_kind="source" component_id=kubernetes_logs component_type=kubernetes_logs}:file_server: vector::internal_events::file::source: Currently ignoring file too small to fingerprint. file=/var/log/pods/my_namespace_img-warm-rv2br_8ec04668-1d3b-4927-acc0-c822cdd09a41/pause/0.log
2025-01-17T16:26:27.447679Z  INFO source{component_kind="source" component_id=kubernetes_logs component_type=kubernetes_logs}:file_server: vector::internal_events::file::source: Resuming to watch file. file=/var/log/pods/my_namespace_img-warm-rv2br_8ec04668-1d3b-4927-acc0-c822cdd09a41/prepull/0.log file_position=46
2025-01-17T16:26:27.447698Z  WARN source{component_kind="source" component_id=kubernetes_logs component_type=kubernetes_logs}:file_server: vector::internal_events::file::source: Currently ignoring file too small to fingerprint. file=/var/log/pods/my_namespace_cleanup-vip-ds-hgl24_178a04c3-ba01-459c-bcaa-7cb7c8e3cfe7/cleanup-vip-ds/0.log
2025-01-17T16:26:27.447729Z  INFO source{component_kind="source" component_id=kubernetes_logs component_type=kubernetes_logs}:file_server: vector::internal_events::file::source: Resuming to watch file. file=/var/log/pods/my_namespace_logsender-fluentd-dqqgt_b1e0f28e-a1bf-4807-92d3-0149ddd1372b/fluentd/0.log file_position=4517486
2025-01-17T16:26:27.447771Z  INFO source{component_kind="source" component_id=kubernetes_logs component_type=kubernetes_logs}:file_server: vector::internal_events::file::source: Resuming to watch file. file=/var/log/pods/my_namespace_logsender-fluentd-dqqgt_b1e0f28e-a1bf-4807-92d3-0149ddd1372b/fluentd/0.log.20250112-014207 file_position=10486229
2025-01-17T16:26:27.447814Z  INFO source{component_kind="source" component_id=kubernetes_logs component_type=kubernetes_logs}:file_server: vector::internal_events::file::source: Resuming to watch file. file=/var/log/pods/my_namespace_logsender-fluentd-for-admin-cv985_2469686b-72fd-488b-a86d-064b04486b63/fluentd/0.log file_position=1390927
2025-01-17T16:26:27.447848Z  INFO source{component_kind="source" component_id=kubernetes_logs component_type=kubernetes_logs}:file_server: vector::internal_events::file::source: Resuming to watch file. file=/var/log/pods/my_namespace_logsender-fluentd-for-admin-cv985_2469686b-72fd-488b-a86d-064b04486b63/fluentd/0.log.20250115-181910 file_position=10485924
2025-01-17T16:26:27.447885Z  INFO source{component_kind="source" component_id=kubernetes_logs component_type=kubernetes_logs}:file_server: vector::internal_events::file::source: Resuming to watch file. file=/var/log/pods/my_namespace_system-admin-config-gqtdh_5652d531-6b9b-4642-9f7b-d63357db946c/sys-config/0.log file_position=3891989
2025-01-17T16:26:27.447918Z  INFO source{component_kind="source" component_id=kubernetes_logs component_type=kubernetes_logs}:file_server: vector::internal_events::file::source: Resuming to watch file. file=/var/log/pods/my_namespace_persistent-volume-csi-cinder-nodeplugin-xg5fr_c1832c1f-10ed-4523-a47d-c1245a902c69/liveness-probe/0.log file_position=77692
2025-01-17T16:26:27.447944Z  INFO source{component_kind="source" component_id=kubernetes_logs component_type=kubernetes_logs}:file_server: vector::internal_events::file::source: Resuming to watch file. file=/var/log/pods/my_namespace_persistent-volume-csi-cinder-nodeplugin-xg5fr_c1832c1f-10ed-4523-a47d-c1245a902c69/persistent-volume-cinder-csi-plugin/43.log file_position=2724
2025-01-17T16:26:27.447978Z  INFO source{component_kind="source" component_id=kubernetes_logs component_type=kubernetes_logs}:file_server: vector::internal_events::file::source: Resuming to watch file. file=/var/log/pods/my_namespace_persistent-volume-csi-cinder-nodeplugin-xg5fr_c1832c1f-10ed-4523-a47d-c1245a902c69/persistent-volume-cinder-csi-plugin/44.log file_position=27572
2025-01-17T16:26:27.448001Z  INFO source{component_kind="source" component_id=kubernetes_logs component_type=kubernetes_logs}:file_server: vector::internal_events::file::source: Resuming to watch file. file=/var/log/pods/my_namespace_persistent-volume-csi-cinder-nodeplugin-xg5fr_c1832c1f-10ed-4523-a47d-c1245a902c69/persistent-volume-node-driver-registrar/0.log file_position=2459
2025-01-17T16:26:27.448033Z  INFO source{component_kind="source" component_id=kubernetes_logs component_type=kubernetes_logs}:file_server: vector::internal_events::file::source: Resuming to watch file. file=/var/log/pods/my_namespace_exporters-node-cvsfd_212d1af6-9b7b-4422-a71d-63d9f9998f90/node-collector/0.log file_position=297637
2025-01-17T16:26:27.448060Z  INFO source{component_kind="source" component_id=kubernetes_logs component_type=kubernetes_logs}:file_server: vector::internal_events::file::source: Resuming to watch file. file=/var/log/pods/my_namespace_exporters-node-cvsfd_212d1af6-9b7b-4422-a71d-63d9f9998f90/node-exporter/0.log file_position=8579

Example Data

No response

Additional Context

Kubernetes version: 1.24
Worker node info:

my_node_name     Ready    worker         318d    v1.24.1   10.192.104.162   <none>        CentOS Linux 7 (Core)              5.4.206-200.el7.x86_64           containerd://1.6.28

helm chart version: 0.37.0

References

I have done some study to the change between 0.43.1 and 0.44.0, and I notice that there is a PR #21905 which have updated the kubernetes_logs source. I checkout to v0.44.0 and removed that commit from the git history, rebuild the binary and docker, and deploy it to our cluster. Unfortunately, it works. As a result, I highly suspect there is some breaking change introduced to the #21905.

My git repo: https://github.com/titaneric/vector/tree/feat/vector-debug


Updated:

I checkout to cd87af9152 (the one commit before #21905) and verify it works, and I checkout to 92d869a59c (the commit of #21905)) and it does emit the logs I posted in this issue.

@titaneric titaneric added the type: bug A code related bug. label Jan 17, 2025
@pront pront added the source: kubernetes_logs Anything `kubernetes_logs` source related label Jan 17, 2025
@pront
Copy link
Member

pront commented Jan 17, 2025

Hi @titaneric, do you suspect that this PR #21905 introduced a regression?

@pront pront added the meta: regression This issue represents a regression label Jan 17, 2025
@titaneric
Copy link
Contributor Author

Yes, I have updated the issue, and I found that the #21905 did change the behaviour in my environment

@titaneric
Copy link
Contributor Author

titaneric commented Jan 18, 2025

I find the root cause of this issue. I notice that vector call the k8s api server with the following request

2025-01-17T16:25:25.955613Z DEBUG HTTP{http.method=GET http.url=https://172.30.0.1/api/v1/pods?&fieldSelector=spec.nodeName%3Dmy_node_name&labelSelector=vector.dev%2Fexclude%21%3Dtrue&limit=500 otel.name="list" otel.kind="client"}: hyper_util::client::legacy::connect::http: connected to 172.30.0.1:443
2025-01-17T16:25:25.956033Z DEBUG HTTP{http.method=GET http.url=https://172.30.0.1/api/v1/namespaces?&labelSelector=vector.dev%2Fexclude%21%3Dtrue&limit=500 otel.name="list" otel.kind="client"}: hyper_util::client::legacy::connect::http: connected to 172.30.0.1:443
2025-01-17T16:25:25.964193Z DEBUG hyper_util::client::legacy::pool: pooling idle connection for ("https", 172.30.0.1)

Notice that it queries the pods with vector.dev/exclude!=true filter, and I do query the pods with kubectl command kubectl get pods -A -l vector.dev/exclude!=true --show-labels. The results show exactly the pods that vector detect on the nodes, but it does not match many of the rest of my pods.

my-namespace   logsender-fluentd-g9zxv                                          1/1     Running     0               290d    app=logsender-fluentd,controller-revision-hash=5944f684bb,pod-template-generation=8
my-namespace   logsender-fluentd-gn4sp                                          1/1     Running     0               18d     app=logsender-fluentd,controller-revision-hash=5944f684bb,pod-template-generation=8
my-namespace   logsender-fluentd-gqh5n                                          1/1     Running     0               4d23h   app=logsender-fluentd,controller-revision-hash=5944f684bb,pod-template-generation=8
my-namespace   logsender-fluentd-hpbxx                                          1/1     Running     0               290d    app=logsender-fluentd,controller-revision-hash=5944f684bb,pod-template-generation=8
my-namespace   logsender-fluentd-hvr4f                                          1/1     Running     0               19d     app=logsender-fluentd,controller-revision-hash=5944f684bb,pod-template-generation=8
my-namespace   logsender-fluentd-j2q56                                          1/1     Running     0               4d22h   app=logsender-fluentd,controller-revision-hash=5944f684bb,pod-template-generation=8
my-namespace   logsender-fluentd-j52vp                                          1/1     Running     0               290d    app=logsender-fluentd,controller-revision-hash=5944f684bb,pod-template-generation=8
my-namespace   logsender-fluentd-j7ncc                                          1/1     Running     0               290d    app=logsender-fluentd,controller-revision-hash=5944f684bb,pod-template-generation=8
my-namespace   logsender-fluentd-jbdwg                                          1/1     Running     0               254d    app=logsender-fluentd,controller-revision-hash=5944f684bb,pod-template-generation=8
my-namespace   logsender-fluentd-jhbq4                                          1/1     Running     0               290d    app=logsender-fluentd,controller-revision-hash=5944f684bb,pod-template-generation=8
my-namespace   logsender-fluentd-k4fw4                                          1/1     Running     0               290d    app=logsender-fluentd,controller-revision-hash=5944f684bb,pod-template-generation=8
my-namespace   logsender-fluentd-k5sjd                                          1/1     Running     0               4d23h   app=logsender-fluentd,controller-revision-hash=5944f684bb,pod-template-generation=8
my-namespace   logsender-fluentd-kz8lw                                          1/1     Running     0               4d22h   app=logsender-fluentd,controller-revision-hash=5944f684bb,pod-template-generation=8
my-namespace   logsender-fluentd-lmvkp                                          1/1     Running     0               4d22h   app=logsender-fluentd,controller-revision-hash=5944f684bb,pod-template-generation=8
my-namespace   logsender-fluentd-m6smg                                          1/1     Running     0               290d    app=logsender-fluentd,controller-revision-hash=5944f684bb,pod-template-generation=8
my-namespace   logsender-fluentd-mrpkk                                          1/1     Running     0               27h     app=logsender-fluentd,controller-revision-hash=5944f684bb,pod-template-generation=8
my-namespace   logsender-fluentd-n25km                                          1/1     Running     0               275d    app=logsender-fluentd,controller-revision-hash=5944f684bb,pod-template-generation=8
my-namespace   logsender-fluentd-n72z8                                          1/1     Running     0               290d    app=logsender-fluentd,controller-revision-hash=5944f684bb,pod-template-generation=8

I think the query filter's result is NOT what we expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta: regression This issue represents a regression source: kubernetes_logs Anything `kubernetes_logs` source related type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

2 participants