-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] whisper-fetch.py --drop incorrect timestamps #305
Comments
TBH I think proper implementation should not drop point but replace it with 0 or previous value instead. But then we should not call this function "drop", isn't it? |
I don't see where it says this is intended. Returning the wrong timestamp does not make any sense to me, can't imagine why someone would want this. |
This cost me a day at work, because I ran For now I'm changing my code to just call |
Possible fix (completely untested): #306 |
Another issue that I ran into is that here you're using local time of the machine that I'm running the data processing on, which gave incorrect results in my case: Lines 86 to 89 in 8d21c56
I'm now using this, I think that should be correct: def read_whisper(path):
(fromTime, untilTime, step), val = whisper.fetch(path, fromTime=0, archiveToSelect=None)
fromTimeStamp = pd.Timestamp(fromTime, unit="s", tz="Europe/Berlin")
index = pd.date_range(
start=fromTimeStamp,
freq="H",
periods=len(val),
)
data = {"val": val}
return pd.DataFrame(data, index=index) |
I was getting incorrect data with I see correct data with Wrote this, which seems to work fine: def read_whisper_archive(path: str, archive_id: int) -> pd.DataFrame:
"""Whisper data read direct implementation with Numpy and Pandas"""
infos = whisper.info(path)
if archive_id < 0 or archive_id >= len(infos["archives"]):
raise ValueError(f"Invalid archive_id = {archive_id}")
dtype = np.dtype([
("time", ">u4"),
("val", ">f8")
])
offset = infos["archives"][archive_id]["offset"]
data = np.fromfile(path, dtype=dtype, offset=offset)
data = data[np.nonzero(data["time"])]
# The astype is needed to avoid this error later on
# ValueError: Big-endian buffer not supported on little-endian compiler
df = pd.DataFrame(
data={"val": data["val"].astype(float)},
index=pd.to_datetime(data["time"], unit="s")
)
df = df.sort_index()
return df This should be much faster and memory-efficient than the current Is the processing correct, i.e. is it guaranteed that non-filled values have time=0? and is the sorting by time at the end needed, or is this already the case in the file? the description at https://graphite.readthedocs.io/en/stable/whisper.html unfortunately doesn't explain where values are filled within the archive. Do you think it could make sense to add a function like this to this repo? The Numpy & Pandas import could be delayed to the function, i.e. it would be an optional dependency. |
Hi @cdeil, |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Just in case someone finds this old thread and is looking for a Whisper file Pandas reader, check this out: Of course, any feedback or contribution would be welcome. Specifically I'm not sure if the It seems to work for my files, but the WhisperDB docs at https://graphite.readthedocs.io/en/latest/whisper.html unfortunately don't say how the file is initialised or where the points are inserted. |
I think there's a bug here:
whisper/bin/whisper-fetch.py
Lines 67 to 69 in 8d21c56
I was using
whisper-fetch.py --drop nulls
and got incorrect timestamps.When dropping values, the timestamps also need to be adjusted to match, no?
The text was updated successfully, but these errors were encountered: