Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MVP: Handle extra fields to export #278

Merged
merged 62 commits into from
Nov 11, 2022
Merged
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
4f81008
extend `try_get_number()` method to handle `inf` and `nan`
joshuaberetta Oct 21, 2021
d4e398a
add tests
joshuaberetta Oct 21, 2021
9cee720
change how fields are combined across versions for repeat groups
joshuaberetta Oct 27, 2021
920c8ff
add test for fix
joshuaberetta Oct 27, 2021
f1a1ad2
chance `field_section_name` to `section_field_name`
joshuaberetta Oct 27, 2021
0537c6a
update code comment
joshuaberetta Oct 27, 2021
8a6300c
make requested changes
joshuaberetta Oct 28, 2021
66adc80
make requested changes
joshuaberetta Oct 28, 2021
2352c61
add v2 fixture
joshuaberetta Oct 28, 2021
0dbb0d2
Merge pull request #276 from kobotoolbox/275-fix-repeat-group-versions
noliveleger Oct 29, 2021
ad78349
Merge pull request #274 from kobotoolbox/272-inf-and-nan
noliveleger Oct 29, 2021
6e238ef
clean slate with fixtures
joshuaberetta Nov 12, 2021
7273fdd
wip AnalysisForm class and basic passing test
joshuaberetta Nov 12, 2021
1e7233b
handle translations and choices, get all tests to pass again
joshuaberetta Nov 12, 2021
7d9f1b8
fix failing fixture test
joshuaberetta Nov 12, 2021
f4d396f
add support for `anaylsis_type` and `settings` field attributes
joshuaberetta Nov 13, 2021
4ff7a3f
minor edit
joshuaberetta Nov 13, 2021
9a66171
wip support for repeat groups
joshuaberetta Nov 19, 2021
978c9f3
fix typo
joshuaberetta Nov 19, 2021
98535f4
restructure fixtures with kpi model changes
joshuaberetta Nov 30, 2021
1883774
update tests with new syntax
joshuaberetta Nov 30, 2021
07a3cbf
return analysis form field labels to lists
joshuaberetta Nov 30, 2021
af46ed5
wip get tests passing again
joshuaberetta Nov 30, 2021
6ca0127
use defaultdict to refactor
joshuaberetta Nov 30, 2021
b22bf79
refactor extracting supplemental value for export
joshuaberetta Nov 30, 2021
3a7429f
handle labels better and refactor AnalysisForm class
joshuaberetta Nov 30, 2021
5624352
refactor and type annotate AnalysisForm class methods
joshuaberetta Dec 3, 2021
a8f60da
remove `[Any]` from annotations
joshuaberetta Dec 7, 2021
33bfed5
add more type annotations
joshuaberetta Dec 7, 2021
2cefe71
fix failing tests with missing import
joshuaberetta Dec 7, 2021
431ee61
clean up tests, remove `load_analysis_form_json()` from `build_fixtur…
joshuaberetta Dec 7, 2021
71d4c64
update autoreport test
joshuaberetta Dec 7, 2021
eb483c0
support updated transcript structure, ammend tests, remove support fo…
joshuaberetta Jan 24, 2022
1cbd7a4
handle translations
joshuaberetta Jan 24, 2022
bec0601
wip integrating changes from kpi
joshuaberetta Feb 8, 2022
1e24c63
handle filtering exports and different language transcriptions for th…
joshuaberetta Feb 9, 2022
15dbb28
use the source label for prefix of transcript and translation field l…
joshuaberetta Feb 15, 2022
aee1e0d
apply requested changes to transcript export format
joshuaberetta Feb 16, 2022
2f290b1
update handling of transcript export formatting
joshuaberetta Feb 16, 2022
79c65fb
minor cleanup
joshuaberetta Feb 17, 2022
3cb1673
Merge branch 'master' into 277-additional-fields-exports
joshuaberetta Mar 22, 2022
484a0eb
black formatting
joshuaberetta Mar 22, 2022
e6c4abe
fix breaking additional fields tests
joshuaberetta Mar 22, 2022
0bcbdb3
pin pyxform version to 1.7.0
joshuaberetta Mar 22, 2022
a41861a
clean up, use constants
joshuaberetta Mar 22, 2022
7988071
Merge branch 'master' into 277-additional-fields-exports
joshuaberetta Apr 13, 2022
4f772e7
remove `include_analysis_fields` flag
joshuaberetta Apr 13, 2022
a48975c
remove pyxform version pin from setup.py
joshuaberetta Apr 13, 2022
53afa08
make requested changes
joshuaberetta Apr 13, 2022
289b523
use `filter_fields` for controlling export of additional analysis fields
joshuaberetta Apr 13, 2022
4ea0c16
mvp filtering for multi-language transcripts
joshuaberetta May 12, 2022
0569abe
change translations field name from `translation_<code>` to `translat…
joshuaberetta May 17, 2022
4da4ec9
remove duplicate inserting of analysis fields
joshuaberetta Jun 1, 2022
17d53b9
split out languages into individual fields in analysis_form_json for …
dorey Oct 3, 2022
ba1fd14
unique name and path for transcript languages in json fixture
dorey Oct 3, 2022
b5d18d6
update handling of transcript fields with new format and update fixtures
joshuaberetta Oct 4, 2022
b71c7c6
key in supplementalData is qpath
dorey Oct 10, 2022
95507a1
Merge branch 'master' into 277-additional-fields-exports
joshuaberetta Oct 11, 2022
9fcec12
Merge branch 'master' into 277-additional-fields-exports
joshuaberetta Oct 11, 2022
c87f93d
Merge branch '277-additional-fields-exports' into 277-additional-fiel…
joshuaberetta Oct 11, 2022
15b2d44
Update tests to match qpath in supplementalData
jnm Nov 11, 2022
62df5d9
Merge pull request #308 from kobotoolbox/277-additional-fields-export…
dorey Nov 11, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions src/formpack/constants.py
Original file line number Diff line number Diff line change
@@ -154,3 +154,13 @@
'form_appearance',
'form_meta_edit',
]

# Analysis types
ANALYSIS_TYPE_CODING = 'coding'
ANALYSIS_TYPE_TRANSCRIPT = 'transcript'
ANALYSIS_TYPE_TRANSLATION = 'translation'
ANALYSIS_TYPES = [
ANALYSIS_TYPE_CODING,
ANALYSIS_TYPE_TRANSCRIPT,
ANALYSIS_TYPE_TRANSLATION,
]
8 changes: 7 additions & 1 deletion src/formpack/pack.py
Original file line number Diff line number Diff line change
@@ -3,9 +3,10 @@
import json
from collections import OrderedDict
from copy import deepcopy
from typing import Dict

from formpack.schema.fields import CopyField
from .version import FormVersion
from .version import FormVersion, AnalysisForm
from .reporting import Export, AutoReport
from .utils.expand_content import expand_content
from .utils.replace_aliases import replace_aliases
@@ -52,6 +53,8 @@ def __init__(

self.asset_type = asset_type

self.analysis_form = None

self.load_all_versions(versions)

# FIXME: Find a safe way to use this. Wrapping with try/except isn't enough
@@ -176,6 +179,9 @@ def load_version(self, schema):

self.versions[form_version.id] = form_version

def extend_survey(self, analysis_form: Dict) -> None:
jnm marked this conversation as resolved.
Show resolved Hide resolved
self.analysis_form = AnalysisForm(self, analysis_form)

def version_diff(self, vn1, vn2):
v1 = self.versions[vn1]
v2 = self.versions[vn2]
71 changes: 62 additions & 9 deletions src/formpack/reporting/export.py
Original file line number Diff line number Diff line change
@@ -4,16 +4,23 @@
import zipfile
from collections import defaultdict, OrderedDict
from inspect import isclass
from typing import Iterator, Generator, Optional
from typing import (
Dict,
Generator,
Iterator,
Optional,
)

import xlsxwriter

from ..constants import (
ANALYSIS_TYPE_TRANSCRIPT,
ANALYSIS_TYPE_TRANSLATION,
GEO_QUESTION_TYPES,
TAG_COLUMNS_AND_SEPARATORS,
UNSPECIFIED_TRANSLATION,
)
from ..schema import CopyField
from ..schema import CopyField, FormField
from ..submission import FormSubmission
from ..utils.exceptions import FormPackGeoJsonError
from ..utils.flatten_content import flatten_tag_list
@@ -60,9 +67,11 @@ def __init__(
:param tag_cols_for_header: list
:param filter_fields: list
:param xls_types_as_text: bool
:param include_media_url: bool
"""

self.formpack = formpack
self.analysis_form = formpack.analysis_form
self.lang = lang
self.group_sep = group_sep
self.title = title
@@ -81,6 +90,12 @@ def __init__(
tag_cols_for_header = []
self.tag_cols_for_header = tag_cols_for_header

_filter_fields = []
for item in self.filter_fields:
item = re.sub(r'^_supplementalDetails/', '', item)
_filter_fields.append(item)
self.filter_fields = _filter_fields

# If some fields need to be arbitrarily copied, add them
# to the first section
if copy_fields:
@@ -224,6 +239,9 @@ def get_fields_labels_tags_for_all_versions(

# Ensure that fields are filtered if they've been specified, otherwise
# carry on as usual
if self.analysis_form:
all_fields = self.analysis_form.insert_analysis_fields(all_fields)

if self.filter_fields:
all_fields = [
field
@@ -320,6 +338,7 @@ def format_one_submission(
submission,
current_section,
attachments=None,
supplemental_details=None,
):

# 'current_section' is the name of what will become sheets in xls.
@@ -382,17 +401,46 @@ def _get_attachment(val, field, attachments):
if re.match(fr'^.*/{_val}$', f['filename']) is not None
]

def _get_value_from_entry(entry, field):
def _get_value_from_supplemental_details(
field: FormField, supplemental_details: Dict
) -> Optional[str]:
source, name = field.analysis_path
_sup_details = supplemental_details.get(source, {})

if not _sup_details:
return

# The names for translation and transcript fields are in the format
# of `translated_<language code>` which must be stripped to get the
# value from the supplemental details dict
if _name := re.match(r'^(translation|transcript)_', name):
name = _name.groups()[0]

val = _sup_details.get(name)
if val is None:
return ''

return val

def _get_value_from_entry(
entry: Dict, field: FormField, supplemental_details: Dict
) -> Optional[str]:
if field.analysis_question and supplemental_details:
return _get_value_from_supplemental_details(
field, supplemental_details
)

suffix = 'meta/' if field.data_type == 'audit' else ''
return entry.get(f'{suffix}{field.path}')

if self.analysis_form:
_fields = self.analysis_form.insert_analysis_fields(_fields)

# Ensure that fields are filtered if they've been specified, otherwise
# carry on as usual
if self.filter_fields:
_fields = tuple(
field
for field in current_section.fields.values()
if field.path in self.filter_fields
field for field in _fields if field.path in self.filter_fields
)

# 'rows' will contain all the formatted entries for the current
@@ -423,13 +471,17 @@ def _get_value_from_entry(entry, field):
row.update(_empty_row)

attachments = entry.get('_attachments') or attachments
supplemental_details = (
entry.get('_supplementalDetails') or supplemental_details
)

for field in _fields:
# TODO: pass a context to fields so they can all format ?
if field.can_format:

# get submission value for this field
val = _get_value_from_entry(entry, field)
val = _get_value_from_entry(
entry, field, supplemental_details
)
# get the attachment for this field
attachment = _get_attachment(val, field, attachments)
# get a mapping of {"col_name": "val", ...}
@@ -493,7 +545,8 @@ def _get_value_from_entry(entry, field):
chunk = self.format_one_submission(
entry[child_section.path],
child_section,
attachments,
attachments=attachments,
supplemental_details=supplemental_details,
)
for key, value in iter(chunk.items()):
if key in chunks:
129 changes: 111 additions & 18 deletions src/formpack/schema/fields.py
Original file line number Diff line number Diff line change
@@ -8,7 +8,13 @@
import statistics

from .datadef import FormDataDef, FormChoice
from ..constants import UNSPECIFIED_TRANSLATION
from ..constants import (
ANALYSIS_TYPES,
ANALYSIS_TYPE_CODING,
ANALYSIS_TYPE_TRANSCRIPT,
ANALYSIS_TYPE_TRANSLATION,
UNSPECIFIED_TRANSLATION,
)
from ..utils import singlemode
from ..utils.ordered_collection import OrderedDefaultdict

@@ -35,6 +41,20 @@ def __init__(
self.section = section
self.can_format = can_format
self.tags = kwargs.get('tags', [])
self.analysis_question = False

source = kwargs.get('source')
if source is not None:
self.source = source
self.analysis_question = True
self.analysis_type = kwargs.get('analysis_type')
self.analysis_path = kwargs.get('analysis_path')
self.settings = kwargs.get('settings')
if self.analysis_type in [
ANALYSIS_TYPE_TRANSCRIPT,
ANALYSIS_TYPE_TRANSLATION,
]:
self.language = kwargs['language']

hierarchy = list(hierarchy) if hierarchy is not None else [None]
self.hierarchy = hierarchy + [self]
@@ -45,11 +65,15 @@ def __init__(
if has_stats is not None:
self.has_stats = has_stats
else:
self.has_stats = data_type != 'note'
self.has_stats = data_type != 'note' and not self.analysis_question

# do not include the root section in the path
self.path = '/'.join(info.name for info in self.hierarchy[1:])

@property
def qpath(self):
return self.path.replace('/', '-')

def get_labels(
self,
lang=UNSPECIFIED_TRANSLATION,
@@ -139,7 +163,7 @@ def _get_label(
# even if `lang` can be None, we don't want the `label` to be None.
label = self.labels.get(lang, self.name)
# If `label` is None, no matches are found, so return `field` name.
return self.name if label is None else label
return label or self.name
jnm marked this conversation as resolved.
Show resolved Hide resolved

def __repr__(self):
args = (self.__class__.__name__, self.name, self.data_type)
@@ -178,13 +202,22 @@ def from_json_definition(
labels = cls._extract_json_labels(definition, translations)
appearance = definition.get('appearance')
or_other = definition.get('_or_other', False)
source = definition.get('source')
analysis_type = definition.get('analysis_type', ANALYSIS_TYPE_CODING)
settings = definition.get('settings', {})
analysis_path = definition.get('path')
languages = definition.get('languages')
language = definition.get('language')

# normalize spaces
data_type = definition['type']

if ' ' in data_type:
raise ValueError('invalid data_type: %s' % data_type)

if analysis_type not in ANALYSIS_TYPES:
raise ValueError(f'Invalid analysis data type: {analysis_type}')

if data_type in ('select_one', 'select_multiple'):
choice_id = definition['select_from_list_name']
# pyxform#472 introduced dynamic list_names for select_one with the
@@ -246,6 +279,12 @@ def from_json_definition(
'section': section,
'choice': choice,
'or_other': or_other,
'source': source,
'analysis_type': analysis_type,
'settings': settings,
'analysis_path': analysis_path,
'language': language,
'languages': languages,
}

if data_type == 'select_multiple' and appearance == 'literacy':
@@ -424,21 +463,6 @@ def get_substats(


class TextField(ExtendedFormField):
def get_stats(self, metrics, lang=UNSPECIFIED_TRANSLATION, limit=100):

stats = super().get_stats(metrics, lang, limit)

top = metrics.most_common(limit)
total = stats['total_count']

percentage = []
for key, val in top:
percentage.append((key, self._get_percentage(val, total)))

stats.update({'frequency': top, 'percentage': percentage})

return stats

def get_disaggregated_stats(
self, metrics, top_splitters, lang=UNSPECIFIED_TRANSLATION, limit=100
):
@@ -459,6 +483,75 @@ def sum_frequencies(element):

return stats

def get_labels(
self,
lang=UNSPECIFIED_TRANSLATION,
group_sep='/',
hierarchy_in_labels=False,
multiple_select='both',
*args,
**kwargs,
):
args = lang, group_sep, hierarchy_in_labels, multiple_select
if getattr(self, 'analysis_type', None) in [
ANALYSIS_TYPE_TRANSCRIPT,
ANALYSIS_TYPE_TRANSLATION,
]:
source_label = self.source_field._get_label(*args)
_type = 'translation' if self._is_translation else 'transcript'
return [f'{source_label} - {_type} ({self.language})']
return [self._get_label(*args)]

def get_stats(self, metrics, lang=UNSPECIFIED_TRANSLATION, limit=100):

stats = super().get_stats(metrics, lang, limit)

top = metrics.most_common(limit)
total = stats['total_count']

percentage = []
for key, val in top:
percentage.append((key, self._get_percentage(val, total)))

stats.update({'frequency': top, 'percentage': percentage})

return stats

@property
def _is_transcript(self):
return getattr(self, 'analysis_type', '') == ANALYSIS_TYPE_TRANSCRIPT

@property
def _is_translation(self):
return getattr(self, 'analysis_type', '') == ANALYSIS_TYPE_TRANSLATION

def format(
self,
val,
lang=UNSPECIFIED_TRANSLATION,
group_sep='/',
hierarchy_in_labels=False,
multiple_select='both',
xls_types_as_text=True,
*args,
**kwargs,
):
if val is None:
val = ''

if isinstance(val, dict):
if self._is_translation:
try:
val = val[self.language]['value']
except KeyError:
val = ''
elif self._is_transcript:
val = (
val['value'] if val['languageCode'] == self.language else ''
)

return {self.name: val}


class MediaField(TextField):
def get_labels(self, include_media_url=False, *args, **kwargs):
Loading