Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

execute_json should set the ExifTool param -struct too #52

Open
nitmws opened this issue Jul 30, 2022 · 8 comments
Open

execute_json should set the ExifTool param -struct too #52

nitmws opened this issue Jul 30, 2022 · 8 comments
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@nitmws
Copy link

nitmws commented Jul 30, 2022

Currently calling execute_json sets only -j as ExifTool parameter - but it does not set the -struct parameter. That's a dangerous for metadata properties with structured values and multiple values.

Example: the IPTC Photo Metadata Standard defines a property Location Shown in the Image which has a structure of City, State/Province, Country Name, Country Code, Sublocation and more. And it may have multiple values = multiple structures.

Using the -j (JSON) parameter without the -struct parameter returns such a result:

"XMP:LocationShownCity": ["City (Location shown2) (ref2021.1)"],
"XMP:LocationShownCountryCode": ["ABC","ABC"],
"XMP:LocationShownCountryName": ["CountryName (Location shown1) (ref2021.1)","CountryName (Location shown2) (ref2021.1)"],
"XMP:LocationShownSublocation": ["Sublocation (Location shown1) (ref2021.1)"],

Using the -j (JSON) parameter WITH the -struct parameter returns such a result:

"XMP:LocationShown": [{
    "CountryCode": "ABC",
    "CountryName": "CountryName (Location shown1) (ref2021.1)",
    "Sublocation": "Sublocation (Location shown1) (ref2021.1)"
  },{
    "City": "City (Location shown2) (ref2021.1)",
    "CountryCode": "ABC",
    "CountryName": "CountryName (Location shown2) (ref2021.1)"
  }],

The essential difference: the XMP:LocationShownCity and the XMP:LocationShownSublocation of the result without -struct have only a single value in the array, but knowbody knows if this is the City name or Sublocation name of the first location of of the second location. While the XMP:LocationShown has a JSON object for each location and the first object has no City but a Sublocation, the second object has a City but no Sublocation. Which location has what structured data is crystal clear.

(Note: the results above are taken from the IPTC Photo Metadata reference image, it has values telling to which property it belongs.)

With this semantic issue as background I suggest to set the -struct parameter with the -j parameter in the execute_json method.

@sylikc
Copy link
Owner

sylikc commented Aug 28, 2022

@nitmws interesting observation, and thanks for the details report. Can you attach a file with these EXIF tags so I can analyze further whether using exiftool (utility) with different parameters might return different results (i'm thinking of testing grouping and stuff)?

Do you think adding the @-struct@ parameter will have any potential impact on undesired behavior?

Thanks

@sylikc sylikc added bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request question Further information is requested labels Aug 28, 2022
@nitmws
Copy link
Author

nitmws commented Aug 28, 2022

@sylikc find attached some files:

Comparing the JSON files will show the big difference: the JSON of the *out_NOstruct file has only properties
with a single or an array of values while the *-out.json file has properties which have another object (or an array of objects) as value - with its own properties - as value.

This difference will have an impact on processing the metadata because with the JSON of the *out_NOstruct file people have to collect a set of properties to generate the metadata of a Location Shown and if there are multiple they have to assign the first value of an array to the first Location Shown and the second value of the array to the second Location Show. Bad experience: the XMP:LocationShownCity property has only one value, to which of the two Locations should it be assigned ???

Showing the metadata of the JSON file with the -struct parameter this is much easier: the properties of the first Location Shown are the properties of the first object in the array of XMP's LocationShown and the second object in the array holds the properties of the second location, it is crystal clear which of the two properties are missing the name of a city.
It is easier to process - but different.

@sylikc
Copy link
Owner

sylikc commented Dec 30, 2022

it's taken me awhile to get back to this... but after thinking it over, while this could potentially be a script-breaking change for downstream users, it's just the better thing to do. Much like some of the ExifToolAlpha() changes I had made making it clearer what info comes from what file, this is a good change even if it does break downstream users (until they adapt their code)

@sylikc
Copy link
Owner

sylikc commented Dec 30, 2022

After reading through the documentation it's still debatable whether to add the flag or not... the current design of PyExifTool makes minimal changes to the default Exiftool output.

There's already tons of grief from people trying to figure out the '-G' and '-n' flags which are specified on default... (only one issue opened in this repo, but the upstream and stackoverflow have a good number)

if specifying -struct lots of changes would be required of downstream users... (as per the exiftool documentation)

By default XMP structures are flattened into individual tags in the JSON output, but the original structure may be preserved with the -struct option (this also causes all list-type XMP tags to be output as JSON arrays, otherwise single-item lists would be output as simple strings).

in my own use case, I already pass -struct in params, and while the '-G', '-n' can be removed... specifying result = self.execute("-j", "-struct", *params) would not allow it to be removed

if I replaced the default parameters to '-G', '-n', '-struct', there could be other unintended side effects (as per exiftool documentation)

-struct, --struct
Output structured XMP information instead of flattening to individual tags. This option works well when combined with the XML (-X) and JSON (-j) output formats. For other output formats, XMP structures and lists are serialized into the same format as when writing structured information (see https://exiftool.org/struct.html for details). When copying, structured tags are copied by default unless --struct is used to disable this feature (although flattened tags may still be copied by specifying them individually unless -struct is used). These options have no effect when assigning new values since both flattened and structured tags may always be used when writing.
(-struct flag disables structured tags copying)

I'm not sure what the right way to do this is... as a user of PyExifTool myself, I just specify -struct in my own calls using params="-struct"

@sylikc sylikc removed the bug Something isn't working label Dec 30, 2022
@nitmws
Copy link
Author

nitmws commented Jan 2, 2023

I understand your concerns, @sylikc , regarding backward compatibility.

We at IPTC are aware than many users of photo metadata prefer using simple properties without a structure. But the structured properties get more important, e.g. for telling from which URL a licensed image can be bought Google uses the Web URL sub-property of the structured Licensor property.

Therefore I suggest this help by PyExiftool to implement the safe use of structured properties in an easy way:

  • Add a function exiftool.ExifTool.execute_struct(self, *params): (or named .execute_safestruct(...) )
    • This function sets the parameters -j, -G, -n and -struct
    • It returns a list of dictionaries made from the JSON object(s) returned by ExifTool
    • (It may ignore parameters of the function call disabling -j, -n, -struct and any variant of -G)
  • The documentation of this function at https://sylikc.github.io/pyexiftool/reference/1-exiftool.html explains the features of this function:
    • Properties structured in the embedded XMP metadata are returned as structure and therefore sub-properties can be related to the wrapping property correctly. (-j, -struct parameter)
    • The simple and the structured properties, including sub-properties, are named properly (-G parameter)
    • The real value of properties with values from an enumeration is returned, not an Exiftool alias. (-n parameter)

@sylikc
Copy link
Owner

sylikc commented Jan 5, 2023

I'm still thinking about your suggestion above... I am debating whether to add this helper function to ExifToolHelper or ExifToolAlpha. The whole description of it is an add-on to the base ExifTool functionality, and so as an extension, it wouldn't end up in the base class...

the -G, and -n is in the common_args by default. Any invocation of execute_json would set the parameter.

It would be relatively trivial to write an execute_struct() into Helper or Alpha

def execute_struct(self, *params):
    return self.execute_json("-struct", *params)

Have you also considered suggesting to the upstream exiftool tool to have -struct the default for -j?

@nitmws
Copy link
Author

nitmws commented Jan 5, 2023

To which class this function is added is up to you.

Can a parameter set by such a function be overridden by the *params? The -G parameter is a good starting point but for the IPTC properties a -G1 is recommended as it includes the XML/XMP namespace in the tag name. And this makes the proper naming of properties more safe.

Regarding combining -j and -struct I had a conversation with Phil Harvey but it is also backward compatibility stopping this idea.

@sylikc
Copy link
Owner

sylikc commented Jan 5, 2023

Can a parameter set by such a function be overridden by the *params? The -G parameter is a good starting point but for the IPTC properties a -G1 is recommended as it includes the XML/XMP namespace in the tag name. And this makes the proper naming of properties more safe.

The -G parameter was set default by the original author of PyExifTool and it's set during init. It sets the common_args property, which is only writable before an invocation of run. Basically, common_args of exiftool is passed to any commands used. if common_args are set to ['-G1', '-n'] in the constructor, or set in the properties afterwards, it'll be included in all commands.

Setting -G1 in params if -G is in common_args doesn't work, I did a quick test

# does not work
with exiftool.ExifTool() as et:
	print(et.execute_json('-G1', filepath))

# works
with exiftool.ExifTool(common_args=['-n']) as et:
	print(et.execute_json('-G1', filepath))

# works
with exiftool.ExifTool(common_args=['-G1', '-n']) as et:
	print(et.execute_json(filepath))

# works
et = exiftool.ExifToolHelper()
et.common_args = ['-G1', '-n']
print(et.execute_json(filepath))

Regarding combining -j and -struct I had a conversation with Phil Harvey but it is also backward compatibility stopping this idea.

I see... yeah, backwards compatibility ties the hands to how much can change with projects which have so many dependencies... I took a leap of faith last year when I totally chopped up PyExifTool #13 . Luckily, aside from code refactors, the intended output didn't change, and it looks like adoption is good.

@sylikc sylikc removed the question Further information is requested label Jan 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants