You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
32 bit numbers were suggested by me to reduce row size in the data table and speed up queries and transmission. We need to make sure we're OK committing to this before launch though as it will be annoying to change if we change our minds.
Particulars to consider:
Can we be assured we won't lose data on conversion to 32 bit floats? maybe it's worth roundtripping data from a kvalobs dump to sanity-check this.
Integer TSIDs limited us to 2.1 million TSIDs, have a real good think about whether we have any risk of approaching that.
The text was updated successfully, but these errors were encountered:
I looked a bit into this after our discussion on how to store the QC provenance, and I found this.
Apparently, due to alignment padding, we are already wasting 4 bytes per row in the data table, and moving to all 8 bytes data types would only add 4 more bytes.
For the provenance, I had suggested using bitstrings, but they have an overhead of 5-8 bytes. Bool columns might be more space-efficient in this case?
Regarding the number of time series, we are already at 500k, so it's probably worth moving to 64-bit IDs.
Apparently, due to alignment padding, we are already wasting 4 bytes per row in the data table, and moving to all 8 bytes data types would only add 4 more bytes.
Nice detective work! This makes a very compelling case for 64 bit on both.
For the provenance, I had suggested using bitstrings, but they have an overhead of 5-8 bytes. Bool columns might be more space-efficient in this case?
I’m not sure what you mean here, we planned to put provenance in a separate table? Unless you mean the end user flags, in which case yes I think bool columns are the way to go
Regarding the number of time series, we are already at 500k
Just noticed I made a typo in my original comment, should be billion not million. Regardless though, it’s a moot point in light of your above findings
I’m not sure what you mean here, we planned to put provenance in a separate table? Unless you mean the end user flags, in which case yes I think bool columns are the way to go
Yeah, sorry, I am mixing stuff up. I was already thinking about the provenance table, but we should probably discuss it in another issue.
intarga
changed the title
Revisit decision of 32 bit numbers vs 64 bit for TSIDs and values
Switch from using 32 bit to 64 bit numbers for tsid and value in the data table
Jan 9, 2025
32 bit numbers were suggested by me to reduce row size in the data table and speed up queries and transmission. We need to make sure we're OK committing to this before launch though as it will be annoying to change if we change our minds.
Particulars to consider:
The text was updated successfully, but these errors were encountered: