Skip to content

Actions

Javier Otegui edited this page Dec 28, 2016 · 1 revision

Actions

Depending on the action parameter, there are different ways of handling duplicates.

report

When using the report method, users will receive a JSON-like document specifying the duplicates in their records. There will be no new file, and nothing will be changed from the original file. The document will have the following structure:

{
    "email": "Email to send notification to",
    "fields": "Number of fields of the data set",
    "records": "Number of records parsed (with duplicates)",
    "warnings": "List of warning messages, if any",
    "file": "Link to the generated file. Only if 'action' is 'remove' or 'flag'",
    "strict_duplicates": {
        "count": "Number of rows that are exact copies of other rows",
        "ids": "List consisting of the IDs of the duplicate rows. Only if ID field is provided or can be determined",
        "index_pairs": "List consisting of the positions of duplicate record pairs. Only if 'count' > 0"
    },
    "partial_duplicates": {
        "count": "Number of rows that are partial copies of other rows",
        "ids": "List consisting of the IDs of the duplicate rows. Only if ID field is provided or can be determined",
        "index_pairs": "List consisting of the positions of duplicate record pairs. Only if 'count' > 0"
    },
    ... # More duplicate types as developed
}

remove

Pretty self-explanatory. The resulting file will omit duplicate rows.

flag

This is the default action, as of Aug-4. With this action, three new fields will be added to the response file:

  • isDuplicate is a boolean field indicating whether or not the row is a duplicate of another row
  • dupicateType is a controlled vocabulary indicating the type of duplicate: full, partial or any other that might come in the future.
  • duplicateOf is a list of all the other record IDs for which the current record is a duplicate. Even for strict duplicates, for the sake of consistency, it makes sense to make this field a list.

The current list of duplicate types is:

Duplicate type Code
No duplicate 0
Strict duplicate 1
Partial duplicate 2

Table of effects

Action isDuplicate Effect
report no Nothing
report yes Add to list
remove no Write row with no flags
remove yes Don't write row
flag no Write row with [0, null, null]
flag yes Write row with [1, type, list_of_dupes]