-
Notifications
You must be signed in to change notification settings - Fork 557
p10 > p50, p90 > p99 for a simple digest #138
Comments
Merging digest is generally better and the forthcoming release will make it
substantially so.
This is still a bit surprising. The digest should have three centroids and
estimates should always make sense.
Which version do you have? There was a monotonicity bug a long time ago.
The only current serious bug in the avl digest has to do with massive
repeats. This problem might disappear by simply moving to the current
release.
On Jul 14, 2017 1:07 AM, "rodrigofarnhamsc" <[email protected]> wrote:
TDigest tdigest = new TDigest(100);
tdigest.add(0.18615591526031494);
tdigest.add(0.4241943657398224);
tdigest.add(0.8813006281852722);
System.out.println("p10: " + tdigest.quantile(0.1));
System.out.println("p50: " + tdigest.quantile(0.5));
System.out.println("p90: " + tdigest.quantile(0.9));
System.out.println("p95: " + tdigest.quantile(0.95));
System.out.println("p99: " + tdigest.quantile(0.99));
The output doesn't look right:
p10: 0.35278283059597015
p50: 0.30517514050006866
p90: 1.018432506918907
p95: 0.9498665675520899
p99: 0.8950138160586358
- p10 > p50
- p90 > p95 > p99
Should I use https://github.com/tdunning/t-digest/blob/master/src/main/
java/com/tdunning/math/stats/MergingDigest.java instead?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#138>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAPSehiUTm-kpOrMHVRJwobqeo94f5zjks5sNyG1gaJpZM4OX9C5>
.
|
Hi @tdunning, thanks for your prompt reply. I'll look into switching to MergingDigest, but it might be tricky. I'm currently persisting TDigest bytes into a database and retrieving it upon querying. |
So luckily it seems that TDigest is byte compatible with AVLTreeDigest. I'm using the following hack to migrate in a "backwards compatible" way:
|
Can you attach a couple of your old format digests so that I can build some
tests to guarantee future compatibility? We are revamping serialization
substantially just now.
I will also add a test based on your previous problem.
…On Fri, Jul 14, 2017 at 5:16 PM, rodrigofarnhamsc ***@***.***> wrote:
So luckily it seems that TDigest is byte compatible with AVLTreeDigest.
I'm using the following hack to migrate in a "backwards compatible" way:
try {
ByteBuffer bb = ByteBuffer.wrap(bytes);
switch (mode) {
case AVL_DIGEST:
return AVLTreeDigest.fromBytes(bb);
case MERGING_DIGEST:
return MergingDigest.fromBytes(bb);
case UNKNOWN:
default:
// Old serialization lacking trailing byte (TDigest from stream-lib, byte compatible with AVLTreeDigest).
return AVLTreeDigest.fromBytes(bb);
}
} catch (Exception e) {
// Old serialization lacking trailing byte (TDigest from stream-lib, byte compatible with AVLTreeDigest).
return AVLTreeDigest.fromBytes(ByteBuffer.wrap(bytes));
}
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#138 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAPSepPirDAZ3CtL3GGOXtOfyw5-mc5Lks5sOATzgaJpZM4OX9C5>
.
|
I just added TDigestTest.testThreePointExample and it looks better than what you have. I think you have a really old version. Here is what I get with trunk:
|
My initial results were from The From inspecting |
For complete clarity, this is a code excerpt showing the problem, as well as the encoded TDigest:
Output:
|
Yeah.
That will print the representation. But it doesn't have the bits from one
of your data structures.
Can you paste in a few of the base-64 values?
I can then put them in as test cases in TDigest (the original)
…On Fri, Jul 14, 2017 at 6:03 PM, rodrigofarnhamsc ***@***.***> wrote:
For complete clarity, this is a code excerpt showing the problem, as well
as the encoded TDigest:
com.clearspring.analytics.stream.quantile.TDigest tdigest = new com.clearspring.analytics.stream.quantile.TDigest(100);
tdigest.add(0.18615591526031494);
tdigest.add(0.4241943657398224);
tdigest.add(0.8813006281852722);
ByteBuffer bb = ByteBuffer.allocate(tdigest.byteSize());
tdigest.asSmallBytes(bb);
byte[] enc = Base64.getEncoder().encode(Arrays.copyOf(bb.array(), bb.position()));
System.out.println("p10: " + tdigest.quantile(0.1));
System.out.println("p50: " + tdigest.quantile(0.5));
System.out.println("p90: " + tdigest.quantile(0.9));
System.out.println("p95: " + tdigest.quantile(0.95));
System.out.println("p99: " + tdigest.quantile(0.99));
System.out.println();
System.out.println(new String(enc));
System.out.println();
tdigest = TDigest.fromBytes(ByteBuffer.wrap(Base64.getDecoder().decode(enc)));
System.out.println("p10: " + tdigest.quantile(0.1));
System.out.println("p50: " + tdigest.quantile(0.5));
System.out.println("p90: " + tdigest.quantile(0.9));
System.out.println("p95: " + tdigest.quantile(0.95));
System.out.println("p99: " + tdigest.quantile(0.99));
com.clearspring.analytics.stream.quantile.TDigest tdigest = new com.clearspring.analytics.stream.quantile.TDigest(100);
tdigest.add(0.18615591526031494);
tdigest.add(0.4241943657398224);
tdigest.add(0.8813006281852722);
ByteBuffer bb = ByteBuffer.allocate(tdigest.byteSize());
tdigest.asSmallBytes(bb);
byte[] enc = Base64.getEncoder().encode(Arrays.copyOf(bb.array(), bb.position()));
System.out.println("p10: " + tdigest.quantile(0.1));
System.out.println("p50: " + tdigest.quantile(0.5));
System.out.println("p90: " + tdigest.quantile(0.9));
System.out.println("p95: " + tdigest.quantile(0.95));
System.out.println("p99: " + tdigest.quantile(0.99));
System.out.println();
System.out.println(new String(enc));
System.out.println();
tdigest = TDigest.fromBytes(ByteBuffer.wrap(Base64.getDecoder().decode(enc)));
System.out.println("p10: " + tdigest.quantile(0.1));
System.out.println("p50: " + tdigest.quantile(0.5));
System.out.println("p90: " + tdigest.quantile(0.9));
System.out.println("p95: " + tdigest.quantile(0.95));
System.out.println("p99: " + tdigest.quantile(0.99));
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#138 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAPSehsBvHpoaFyhDmycYsEBpvDurM8yks5sOA_TgaJpZM4OX9C5>
.
|
(As you can see these are all pretty small for now, still ramping up traffic). |
The output doesn't look right:
Should I use https://github.com/tdunning/t-digest/blob/master/src/main/java/com/tdunning/math/stats/MergingDigest.java instead?
The text was updated successfully, but these errors were encountered: