-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[question] any suggestion about compressing serialized data? #10
Comments
Do you get good results with gzip? I have no experience compressing roaring bitmaps with generic codecs... assuredly, the impact will be data specific... Related: Compressing JSON: gzip vs zstd |
thanks for fast reply! i write a test (java/golang), populate roaringbitmap with random uint32 and serialize it, also using gzip compress it, but it tunrns out data almost not compressed. // populate with random data
Random r = new Random();
RoaringBitmap rbm = new RoaringBitmap();
for (int i = 0; i < size; i++) {
long rValue = r.nextLong() & 0xffffffffL;
int casted = (int) rValue;
rbm.add(casted);
}
// dump to disk
ByteBuffer buffer = ByteBuffer.allocate(rbm.serializedSizeInBytes());
rbm.serialize(buffer);
Path path = Paths.get(filepath);
Files.write(path, buffer.array());
// compress and dump
Path cpath = Paths.get(compressedFilepath);
Files.write(cpath, new byte[0], StandardOpenOption.CREATE, StandardOpenOption.TRUNCATE_EXISTING);
try (GZIPOutputStream gos = new GZIPOutputStream(new FileOutputStream(cpath.toFile()))) {
gos.write(buffer.array());
} here is result:
i am trying zstd |
it seems that generic compress not fit for roaringbitmap
|
Interestingly, it looks like lz4 makes things worse in your test! |
roaring bitmaps are already a type of compression. therefore the entropy of the serialized data should already be rather close to the maximum (you can make an entropy graph for example using |
suppose i have multiple clients implemented in different languages(go, java...), and all of them need to download roaringbitmap from server through http.
when bitmap gets large(about 10 millions) and serialized binary data comes out about 20 MB, i think compress before sending may save a lot of transmission time.
i am trying use gzip. any suggestions about compressing?
thanks in advance
The text was updated successfully, but these errors were encountered: