You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
After network issues within our clusters we found, that ingesters were not able to join ring so we had to manually remove them.
There are multiple issues, describing this behavior, eg #8615 and #14847 .
Suggested fix would be to add autoforget_unhealthy flag to ingester config by default, as there seems to be no downsides for it.
Expected behavior
Unhealthy ingester leaving their ring after a timeout.
Environment:
Kubernetes 1.27
The text was updated successfully, but these errors were encountered:
Can you provide more information on how to reproduce issue?
I have recently tried to reproduce a very similar report, but for me the ingesters instantly became healthy again after the network issues were removed.
@xperimental ok, so it is not exactly what happened to our clusters, but i was able to reproduce similar behavior. Turns out the problem is scaling ingester replicas count down while expiriencing network issues. You can use following steps to reproduce it:
Create lokistack with
template:
ingester:
replicas: 3
Disable network for ingesters. I used following NetworkPolicy as we use calico cni:
Change ingester replica to 2 or 1, wait until pods deleted.
Remove network policy. Now deleted ingesters will stuck in UNHEALTHY state. They can be fixed if scaled back to 3 or 2 so that the pod with the same name as the unhealthy ingester appears. However these ingesters will not become healthy or forgotten on their own.
Describe the bug
After network issues within our clusters we found, that ingesters were not able to join ring so we had to manually remove them.
There are multiple issues, describing this behavior, eg #8615 and #14847 .
Suggested fix would be to add
autoforget_unhealthy
flag to ingester config by default, as there seems to be no downsides for it.Expected behavior
Unhealthy ingester leaving their ring after a timeout.
Environment:
Kubernetes 1.27
The text was updated successfully, but these errors were encountered: