Do not require extra copy of dimensional goals aggregated data when evaluating without dimensions #74

ondraz · 2024-06-04T15:04:16Z

E.g. when evaluating aggregated data of test-mutli-dimension experiment in test_multi_dimension, we require to have another copy of aggregated data without dimensional columns see here.

It would be nice just to "group by" dimensional data without the need to have extra aggregated data without dimensions in agg goals dataframe.

current data:

test-multi-dimension		a	test_unit_type	global	exposure			1000	1000	1000	1000	1000
test-multi-dimension		b	test_unit_type	global	exposure			1001	1001	1001	1001	1001
test-multi-dimension		a	test_unit_type	unit	view	button-1	p-1	200	200	200	200	200
test-multi-dimension		b	test_unit_type	unit	view	button-1	p-1	220	220	220	220	220
test-multi-dimension		a	test_unit_type	unit	view	button-1		100	100	100	100	100
test-multi-dimension		b	test_unit_type	unit	view	button-1		180	180	180	180	180
test-multi-dimension		a	test_unit_type	unit	view			300	300	300	300	300
test-multi-dimension		b	test_unit_type	unit	view			400	400	400	400	400

data format requested in this issue:

test-multi-dimension		a	test_unit_type	global	exposure			1000	1000	1000	1000	1000
test-multi-dimension		b	test_unit_type	global	exposure			1001	1001	1001	1001	1001
test-multi-dimension		a	test_unit_type	unit	view	button-1	p-1	200	200	200	200	200
test-multi-dimension		b	test_unit_type	unit	view	button-1	p-1	220	220	220	220	220
test-multi-dimension		a	test_unit_type	unit	view	button-1		100	100	100	100	100
test-multi-dimension		b	test_unit_type	unit	view	button-1		180	180	180	180	180

The text was updated successfully, but these errors were encountered:

jancervenka · 2024-06-11T11:01:13Z

@ondraz Hi Ondro, I think doing this might be tricky because the dataframe would then need to contain all possible combinations of element and product dimension values for the view goal (for example rows for button-2 and p-2. Otherwise, the group by would produce aggregations with missing data.

ondraz · 2024-06-12T06:55:13Z

If we just aggregate (sum) these 4 lines of agg. goal data,

test-multi-dimension		a	test_unit_type	unit	view	button-1	p-1	200	200	200	200	200
test-multi-dimension		b	test_unit_type	unit	view	button-1	p-1	220	220	220	220	220
test-multi-dimension		a	test_unit_type	unit	view	button-1		100	100	100	100	100
test-multi-dimension		b	test_unit_type	unit	view	button-1		180	180	180	180	180

we get exactly what we already have in the extra two lines with no dim values:

test-multi-dimension		a	test_unit_type	unit	view			300	300	300	300	300
test-multi-dimension		b	test_unit_type	unit	view			400	400	400	400	400

so:

goal count(test_unit_type.unit.view) - we can use four lines above and just sum them
goal count(test_unit_type.unit.view(element=button-1) - we filter four lines above by dim value and sum values

There's probably some argument we did it this way where we require those extra two lines with empty dim data but I don't recall it.

jancervenka · 2024-06-12T15:13:34Z

You're right that it works in this case but I don't think it would work in general.

Let's say there are 200 views with element = button-2 in the data that the DAO is selecting from.
But because there is no goal count(test_unit_type.unit.view(element=button-2)) in the experiment metrics, the button-2 views will not show up in the data frame.
Summing the dataframe rows with element = button-1 to produce count(test_unit_type.unit.view) will get us incorrect results because it will be missing the 200 button-2 views.

jancervenka · 2024-06-12T15:20:37Z

Also looking at the test-multi-dimension data, they kind of don't make sense. 😄

test-multi-dimension		a	test_unit_type	unit	view	button-1	p-1	200	200	200	200	200
test-multi-dimension		b	test_unit_type	unit	view	button-1	p-1	220	220	220	220	220
test-multi-dimension		a	test_unit_type	unit	view	button-1		100	100	100	100	100
test-multi-dimension		b	test_unit_type	unit	view	button-1		180	180

For example, the third row contains all button-1 views from all products so its total count shouldn't be lower than the total count from the first row which represent button-1 views from p-1 product only. The first row views should be subset of the third row views.

I was just testing that the goal selection works correctly and didn't think about the specific values.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not require extra copy of dimensional goals aggregated data when evaluating without dimensions #74

Do not require extra copy of dimensional goals aggregated data when evaluating without dimensions #74

ondraz commented Jun 4, 2024 •

edited

Loading

jancervenka commented Jun 11, 2024

ondraz commented Jun 12, 2024

jancervenka commented Jun 12, 2024

jancervenka commented Jun 12, 2024 •

edited

Loading

Do not require extra copy of dimensional goals aggregated data when evaluating without dimensions #74

Do not require extra copy of dimensional goals aggregated data when evaluating without dimensions #74

Comments

ondraz commented Jun 4, 2024 • edited Loading

jancervenka commented Jun 11, 2024

ondraz commented Jun 12, 2024

jancervenka commented Jun 12, 2024

jancervenka commented Jun 12, 2024 • edited Loading

ondraz commented Jun 4, 2024 •

edited

Loading

jancervenka commented Jun 12, 2024 •

edited

Loading