Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disaggregated stats wrong total #191

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
14 changes: 12 additions & 2 deletions src/formpack/schema/fields.py
Original file line number Diff line number Diff line change
Expand Up @@ -224,8 +224,18 @@ def get_disaggregated_stats(self, metrics, top_splitters,
not_provided = 0
provided = 0
for val, counter in metrics.items():
not_provided += counter.pop(None, 0)
provided += counter.pop('__submissions__', 0)
none_submissions = counter.pop(None, 0)
not_provided += none_submissions
# `counter[None]` corresponds to submissions with no values for current field (`self`)
# If `val` is None, some submissions don't have any values for the `splitted_by_field`.
# We should consider all these submissions as `not_provided`
if val is None:
not_provided += sum(counter.values())
else:
# `counter['__submissions__'] contains the count of all submissions,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble figuring out where this happens. I found these:

if value is not None:
counters['__submissions__'] += 1

counter.update(values)
counter['__submissions__'] += 1

...but they both increment counter['__submissions__'] only when the value is not None.

Copy link
Contributor Author

@noliveleger noliveleger Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jnm,

Let's say, we have a form with two questions (select one), not mandatory:

Favorite coffee:

  • Nespresso
  • Keurig
  • Tim Horton

Type of coffee:

  • Regular
  • Espresso
  • Latte

Three users submit data (first case):
First one:
- Favorite coffee: Nespresso
- Type of coffee: Regular

Second one:
- Favorite coffee: Tim Horton
- Type of coffee: Latte

Third one:
- Favorite coffee: Keurig
- Type of coffee: No choices checked

Disaggregating the stats by grouping by Favorite coffee should return 3 Counters

  • None => {'keuring': 1}
  • Latte => {'tim_horton': 1}
  • Regular => {'nespresso': 1}

Now, a fourth user submits this (second case):
- Favorite coffee: No choices checked
- Type of coffee: latte

It should still return the 3 sames counters, except that Latte Counter should have changed for
{'None': 1, 'tim_horton': 1}

Having that said, the if condition handles the first case, the else handles the second one.

# including those where no response was provided.
# We need to substract all submissions with no response
provided += counter.pop('__submissions__', 0) - none_submissions

return {
'total_count': not_provided + provided,
Expand Down