Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't get a continuous tracking (part2) #7

Open
bousqi opened this issue Oct 7, 2016 · 12 comments
Open

Can't get a continuous tracking (part2) #7

bousqi opened this issue Oct 7, 2016 · 12 comments
Labels

Comments

@bousqi
Copy link
Contributor

bousqi commented Oct 7, 2016

It appears that a bug is still present in freebox statistics tracking.
This time I don't have any clue on the issue (no exception nor error messages).

Here is the bug behavior : after a certain period of time (from 4 days up to 1 week), some statistics are stuck. Munin alway gets the same value while if I connect to the Freebox server, the value are different.
It appears that restarting the freebox does fix the problem, but it is not really linked to the box itself (has the reported value on internal webserver are updated).

Here are some graph where you can see some area where value are stuck :

freebox_xdsl-month
freebox_switch1-month
freebox_temp-month
freebox_traffic-month

Values are updated when box has been restarted.
This issue only concerns temperature, xdsl, traffix and switch.

Manully running plugins gives results, but not the correct one.

$ sudo munin-run freebox-temp
cpum.value 63.0
cpub.value 62.0
hdd.value 39.0
sw.value 52.0

@quentin-st quentin-st added the bug label Oct 7, 2016
@quentin-st
Copy link
Owner

I just checked my own stats: I'm unfortunately not experiencing this issue:

image

First: are you using the latest script version? A git pull should ensure you are.

Which version of Python are you using?

The graphs you're mentioning uses the /rdd endpoint. It is the only endpoint where we need to specify date_start and date_end parameters, which are computed here: main.py#257

Could your system clock be out-of-sync from times to times? I had an issue where my Raspberry Pi time wasn't correct because of wrong NTP settings. I think that during these periods, the date_start_timestamp and date_end_timestamp aren't correctly computed.

For sure (and that's too bad for us), since the plugin isn't misbehaving (at least it thinks so), there's no error log anywhere.

@bousqi
Copy link
Contributor Author

bousqi commented Oct 7, 2016

I'm running on the master head of your git.
I was thinking that python 3 was my default interpreter but in fact my system is using 2.7.9. I did checked with 3.4.2 and 2.7.9 and results are the same (stuck values).
Your NTP remark is interesting. My raspberry clock seems to be correct, same date on 3 different systems (NTP synchronized). Maybe the freebox has a clock bias (it would explain that freebox reboot fix it, and that bias increase over the time). I'm thinking of it, but I don't know how to verify this.
Any suggestion ?

@bousqi
Copy link
Contributor Author

bousqi commented Oct 7, 2016

Funny thing, graphics are also lost on freebox side...
So it might not be an issue in getting values, but rather the plugin crashing the tracking on freebox.

fbox_temp

I did not realized till now because I was just checking the temperature on first page, where values are ok :

fbox_temp2

Freebox server version is 3.3.3 (up to date).

@quentin-st
Copy link
Owner

Sure! Just create a test.py file with the following content:

import datetime

now = datetime.datetime.now()  # math.ceil(time.time())
now = now.replace(second=0, microsecond=0)
date_end = now.replace(minute=now.minute - now.minute % 5)  # Round to lowest 5 minutes
date_start = now - datetime.timedelta(minutes=5)  # Remove 5 minutes from date_end

print(date_end)
print(date_start)

chmod & run it (with the same Python version as the one munin uses to be sure), and check if the dates are correct

@bousqi
Copy link
Contributor Author

bousqi commented Oct 7, 2016

rpi-stable:/usr/local/src/munin-freebox (master) $ date
Fri Oct 7 14:12:37 CEST 2016
rpi-stable:/usr/local/src/munin-freebox (master) $ ./date.py
2016-10-07 14:10:00
2016-10-07 14:07:00

@quentin-st
Copy link
Owner

quentin-st commented Oct 7, 2016

About your temperatures screenshot: that's really weird indeed. Maybe repeated API calls breaks data storage on the Freebox side. Then, our script isn't able to correctly read these values (or the API returns a stable value while there is none)

Could you try to disable our script for now, and check if Freebox OS's stats goes back to normal?

About your script output: the dates seems to be OK

Edit: these dates are not OK actually, you should have this:
2016-10-07 14:10:00
2016-10-07 14:05:00

@bousqi
Copy link
Contributor Author

bousqi commented Oct 7, 2016

all freebox plugins have been removed, and munin-node restarted.
I'll wait for a few minutes/hours to check if freebox graphics are resurrected.

@quentin-st
Copy link
Owner

Alright, I'm fixing the dates issue - which isn't really one as long as the script is run when the minutes component of the current time is a multiple of 5

@bousqi
Copy link
Contributor Author

bousqi commented Oct 7, 2016

How many plugins are enabled on your munin server ?
Is the freebox a classic one or an optical one ? last firmware ?

quentin-st added a commit that referenced this issue Oct 7, 2016
…he current time is not a multiple of 5

Issue discovered in issue #7
@quentin-st
Copy link
Owner

44 plugins, Freebox Revolution, last firmware

(tip: I'm using Material-Freebox-OS to spare my eyes when browsing Freebox OS)

@bousqi
Copy link
Contributor Author

bousqi commented Oct 7, 2016

Freebox Server (r2) ?

Till now the graphics are still dead on the freebox. I'll reboot it later.
I guess i'll have to add one by one each plugin to check which one crash to tracking on the box.

About the rrd queries. Is it possible that Munin makes to many concurrent queries to rrd API ? Would it be possible to process them sequentially rather that in parallel ?
What would be your approach to identify the origin of this problem ?

@quentin-st
Copy link
Owner

So the stats on the Freebox are still crashed. We may predict that everything will be OK after a reboot.

I never heard about too many concurrent queries being a problem with the rrd API, even with Freebox Stats...
I'm not really sure if munin calls each plugin sequentially or in parallel. Though, since munin is responsible for this logic and we cannot override this, we have no maneuver margin here.

We don't have access to any Freebox OS log either, so our only solution seems to be opening on issue on Freebox OS's bug tracker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants