Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not require extra copy of dimensional goals aggregated data when evaluating without dimensions #74

Open
ondraz opened this issue Jun 4, 2024 · 4 comments

Comments

@ondraz
Copy link
Contributor

ondraz commented Jun 4, 2024

E.g. when evaluating aggregated data of test-mutli-dimension experiment in test_multi_dimension, we require to have another copy of aggregated data without dimensional columns see here.

It would be nice just to "group by" dimensional data without the need to have extra aggregated data without dimensions in agg goals dataframe.

current data:

test-multi-dimension		a	test_unit_type	global	exposure			1000	1000	1000	1000	1000
test-multi-dimension		b	test_unit_type	global	exposure			1001	1001	1001	1001	1001
test-multi-dimension		a	test_unit_type	unit	view	button-1	p-1	200	200	200	200	200
test-multi-dimension		b	test_unit_type	unit	view	button-1	p-1	220	220	220	220	220
test-multi-dimension		a	test_unit_type	unit	view	button-1		100	100	100	100	100
test-multi-dimension		b	test_unit_type	unit	view	button-1		180	180	180	180	180
test-multi-dimension		a	test_unit_type	unit	view			300	300	300	300	300
test-multi-dimension		b	test_unit_type	unit	view			400	400	400	400	400

data format requested in this issue:

test-multi-dimension		a	test_unit_type	global	exposure			1000	1000	1000	1000	1000
test-multi-dimension		b	test_unit_type	global	exposure			1001	1001	1001	1001	1001
test-multi-dimension		a	test_unit_type	unit	view	button-1	p-1	200	200	200	200	200
test-multi-dimension		b	test_unit_type	unit	view	button-1	p-1	220	220	220	220	220
test-multi-dimension		a	test_unit_type	unit	view	button-1		100	100	100	100	100
test-multi-dimension		b	test_unit_type	unit	view	button-1		180	180	180	180	180
@jancervenka
Copy link
Collaborator

@ondraz Hi Ondro, I think doing this might be tricky because the dataframe would then need to contain all possible combinations of element and product dimension values for the view goal (for example rows for button-2 and p-2. Otherwise, the group by would produce aggregations with missing data.

@ondraz
Copy link
Contributor Author

ondraz commented Jun 12, 2024

If we just aggregate (sum) these 4 lines of agg. goal data,

test-multi-dimension		a	test_unit_type	unit	view	button-1	p-1	200	200	200	200	200
test-multi-dimension		b	test_unit_type	unit	view	button-1	p-1	220	220	220	220	220
test-multi-dimension		a	test_unit_type	unit	view	button-1		100	100	100	100	100
test-multi-dimension		b	test_unit_type	unit	view	button-1		180	180	180	180	180

we get exactly what we already have in the extra two lines with no dim values:

test-multi-dimension		a	test_unit_type	unit	view			300	300	300	300	300
test-multi-dimension		b	test_unit_type	unit	view			400	400	400	400	400

so:

  1. goal count(test_unit_type.unit.view) - we can use four lines above and just sum them
  2. goal count(test_unit_type.unit.view(element=button-1) - we filter four lines above by dim value and sum values

There's probably some argument we did it this way where we require those extra two lines with empty dim data but I don't recall it.

@jancervenka
Copy link
Collaborator

You're right that it works in this case but I don't think it would work in general.

  1. Let's say there are 200 views with element = button-2 in the data that the DAO is selecting from.
  2. But because there is no goal count(test_unit_type.unit.view(element=button-2)) in the experiment metrics, the button-2 views will not show up in the data frame.
  3. Summing the dataframe rows with element = button-1 to produce count(test_unit_type.unit.view) will get us incorrect results because it will be missing the 200 button-2 views.

@jancervenka
Copy link
Collaborator

jancervenka commented Jun 12, 2024

Also looking at the test-multi-dimension data, they kind of don't make sense. 😄

test-multi-dimension		a	test_unit_type	unit	view	button-1	p-1	200	200	200	200	200
test-multi-dimension		b	test_unit_type	unit	view	button-1	p-1	220	220	220	220	220
test-multi-dimension		a	test_unit_type	unit	view	button-1		100	100	100	100	100
test-multi-dimension		b	test_unit_type	unit	view	button-1		180	180	

For example, the third row contains all button-1 views from all products so its total count shouldn't be lower than the total count from the first row which represent button-1 views from p-1 product only. The first row views should be subset of the third row views.

I was just testing that the goal selection works correctly and didn't think about the specific values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants