-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data Lag #572
Comments
If your relay queues are increasing, then it's most likely that the downstream caches aren't able to keep up. How are they doing with CPU? How is I/O behaving? |
Note that there's nothing particularly wrong with having lots of queues. As you can see in obfuscurity/synthesize#12 (comment), this system had over 600k queues (unique metric keys) in memory. What's important is that the number of datapoints in memory doesn't bloat over time; the metrics should eventually be flushed to disk (batch writes are efficient, represented by the TL;DR it's all relative. If your box is performing fine, then the number of queues and datapoints in memory are irrelevant. P.S. I just noticed that your |
How do i know the number of datapoints in memory doesn't bloat? Does the percentage of used memory suffice? |
You can track the Setting |
hi @obfuscurity, I tried that metric you gave me, but the value was shown as negative. Should I use perSecond() function in order to display this data point properly? or something wrong happen with my cluster? |
There have been some reports of the |
@obfuscurity, sorry I don't quite follow the issue in #551, since the issue raised there is about locking something and I don't see any statement for fixing this problem. Am I see it wrong? |
You asked about the metric showing as negative. I was explaining that it's a bug and there's a potential fix. |
Hi guys, I'm having a bit of difficulty to understand what is the root cause of data lagging in my graphite cluster. Sometimes the new data won't show up until 1 minute or so. I have already upgraded the instances in my cluster, I don't think there are any hardware bottleneck at the moment.
Moreover, I look at my relays metrics, and the number of queue is quite high, around 6k.
What should I do to decrease the number of queue? Is that related to the data lagging I experience?
I tried to find the documentation explaining about every carbon metrics, but I couldn't find it. Does anybody has any reference about this metrics?
The text was updated successfully, but these errors were encountered: