Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MET-6139 Move rdf convertion classes from metis-framework. #14

Conversation

stzanakis
Copy link
Member

@stzanakis stzanakis commented Oct 8, 2024

@jeortizquan
This is a first step of the code migration.
In this first iteration Tests(if any) are not migrated, packaging organization not optimized, reported issues not resolved.
With the first review, we just agree on the direction and then we continue the implementation on this PR.

@jochen-vermeulen In the end the todo I mentioned was actually from you in the end. So perhaps you have a say in this one as well?

@stzanakis stzanakis requested a review from jeortizquan October 8, 2024 13:47
@jochen-vermeulen
Copy link
Contributor

@stzanakis Just wondering about separation of concerns and all that. I think when it comes to the metis-schema module, we should try to keep it focused only on functionality that is specific to the RDF schema and converting to/from it.

It seems that you're proposing to move media-processing specific code to the metis schema (e.g. extract the hasView, isShownAt, etc. from records). Reading back my TODO I believe my thoughts at the time was to transfer just the code that deserializes to a Document. The code specific to media processing could stay.

Specifically, in RdfDeserializer I marked the code that I think should stay in the media processing module. I think that RdfNamespaceContext, ResourceEntry, RdfResourceEntry, RdfXpathContstants, UrlType and perhaps other bits of code then also should stay in media processing?

Bits and pieces like XPathExpressionWrapper may perhaps stay and could then be used throughout the codebase? Or perhaps it should move to metis common instead?

What do you think?

Comment on lines +46 to +65
private static final String OEMBED_NAMESPACE = "https://oembed.com/";
private static final String XPATH_OEMBED_SERVICES =
SVCS_SERVICE + "[dcterms:conformsTo/@rdf:resource = \"" + OEMBED_NAMESPACE + "\"]";
private static final String XPATH_OEMBED_WEB_RESOURCES = EDM_WEBRESOURCE
+ "[svcs:has_service/@rdf:resource = " + XPATH_OEMBED_SERVICES + "/@rdf:about]";
private static final String XPATH_IS_OEMBED_RESOURCE_CONDITION = "[. = "
+ XPATH_OEMBED_WEB_RESOURCES + "/@rdf:about]";
private static final String OEMBED_XPATH_CONDITION_IS_SHOWN_BY =
EDM_IS_SHOWN_BY + XPATH_IS_OEMBED_RESOURCE_CONDITION;
private static final String OEMBED_XPATH_CONDITION_HAS_VIEW =
EDM_HAS_VIEW + XPATH_IS_OEMBED_RESOURCE_CONDITION;

private final XPathExpressionWrapper getObjectExpression = new XPathExpressionWrapper(xPath -> xPath.compile(EDM_OBJECT));
private final XPathExpressionWrapper getHasViewExpression = new XPathExpressionWrapper(xPath -> xPath.compile(EDM_HAS_VIEW));
private final XPathExpressionWrapper getIsShownAtExpression = new XPathExpressionWrapper(
xPath -> xPath.compile(EDM_IS_SHOWN_AT));
private final XPathExpressionWrapper getIsShownByExpression = new XPathExpressionWrapper(
xPath -> xPath.compile(EDM_IS_SHOWN_BY));
private final XPathExpressionWrapper getOEmbedExpression = new XPathExpressionWrapper(
xPath -> xPath.compile(OEMBED_XPATH_CONDITION_HAS_VIEW + " | " + OEMBED_XPATH_CONDITION_IS_SHOWN_BY));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Media Processing specific code

Comment on lines +138 to +229
public RdfResourceEntry getMainThumbnailResource(byte[] input) throws RdfDeserializationException {
return performDeserialization(input, this::getMainThumbnailResource);
}

public RdfResourceEntry getMainThumbnailResource(InputStream inputStream)
throws RdfDeserializationException {
return getMainThumbnailResource(deserializeToDocument(inputStream)).orElse(null);
}

public Optional<RdfResourceEntry> getMainThumbnailResource(Document document)
throws RdfDeserializationException {

// Get the entries of the required types.
final Map<String, ResourceInfo> resourceEntries = getResourceEntries(document,
Collections.singleton(UrlType.URL_TYPE_FOR_MAIN_THUMBNAIL_RESOURCE));

// If there is not exactly one, we return an empty optional.
if (resourceEntries.size() != 1) {
return Optional.empty();
}

// So there is exactly one. Convert and return.
return Optional.of(convertToResourceEntries(resourceEntries).get(0));
}

public List<RdfResourceEntry> convertToResourceEntries(
Map<String, ResourceInfo> urlWithTypes) {
return urlWithTypes.entrySet().stream().map(RdfDeserializer::convertToResourceEntry)
.toList();
}

private static RdfResourceEntry convertToResourceEntry(Map.Entry<String, ResourceInfo> entry) {
return new RdfResourceEntry(entry.getKey(), entry.getValue().urlTypes(),
entry.getValue().configuredForOembed());
}

/**
* Gets resource entries.
*
* @param document the document
* @param allowedUrlTypes the allowed url types
* @return the resource entries
* @throws RdfDeserializationException the rdf deserialization exception
*/
public Map<String, ResourceInfo> getResourceEntries(Document document,
Set<UrlType> allowedUrlTypes) throws RdfDeserializationException {

// Get the resources and their types.
final Map<String, Set<UrlType>> urls = new HashMap<>();
for (UrlType type : allowedUrlTypes) {
final Set<String> urlsForType = getUrls(document, type);
for (String url : urlsForType) {
urls.computeIfAbsent(url, k -> new HashSet<>()).add(type);
}
}

// For each resource, check whether they are configured for oEmbed.
final Map<String, ResourceInfo> result = HashMap.newHashMap(urls.size());
final Set<String> oEmbedUrls = getOEmbedUrls(document);
for (Entry<String, Set<UrlType>> entry : urls.entrySet()) {
boolean isConfiguredForOembed = oEmbedUrls.contains(entry.getKey());
result.put(entry.getKey(), new ResourceInfo(entry.getValue(), isConfiguredForOembed));
}

// Done
return result;
}

private Set<String> getUrls(Document document, UrlType type) throws RdfDeserializationException {

// Determine the right expression to apply.
final XPathExpressionWrapper expression =
switch (type) {
case OBJECT -> getObjectExpression;
case HAS_VIEW -> getHasViewExpression;
case IS_SHOWN_AT -> getIsShownAtExpression;
case IS_SHOWN_BY -> getIsShownByExpression;
};

// Evaluate the expression and convert the node list to a set of attribute values.
final NodeList nodes = expression.evaluate(document);
return IntStream.range(0, nodes.getLength()).mapToObj(nodes::item).map(Node::getNodeValue)
.collect(Collectors.toSet());
}

private Set<String> getOEmbedUrls(Document document) throws RdfDeserializationException {
final NodeList oEmbedNodes = getOEmbedExpression.evaluate(document);
return IntStream.range(0, oEmbedNodes.getLength())
.mapToObj(oEmbedNodes::item)
.map(Node::getNodeValue)
.collect(Collectors.toSet());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Media Processing specific code.

@jochen-vermeulen
Copy link
Contributor

And I think my TODO suggested something else. Namely, removing duplicate code between RdfConversionUtils already existing in Metis Schema on the one hand, and RdfDeserializer and RdfSerializer on the other. I marked two such cases, but I think there are many more opportunities for merging these three classes.

Comment on lines +114 to +122
public RDF deserialize(InputStream inputStream) throws RdfDeserializationException {
try {
final IUnmarshallingContext context = rdfBindingFactory.createUnmarshallingContext();
return (RDF) context.unmarshalDocument(inputStream, UTF8);
} catch (JiBXException e) {
throw new RdfDeserializationException(
"Something went wrong with converting to or from the RDF format.", e);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate with RdfConversionUtils

Comment on lines +75 to +86
public byte[] convertRdfToBytes(RDF rdf) throws RdfSerializationException {
try {
IMarshallingContext context = rdfBindingFactory.createMarshallingContext();
context.setIndent(INDENTATION_SPACE);
ByteArrayOutputStream out = new ByteArrayOutputStream();
context.marshalDocument(rdf, UTF8, null, out);
return out.toByteArray();
} catch (JiBXException e) {
throw new RdfSerializationException(
"Something went wrong with converting to or from the RDF format.", e);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate with RdfConversionUtils.

@stzanakis
Copy link
Member Author

@stzanakis Just wondering about separation of concerns and all that. I think when it comes to the metis-schema module, we should try to keep it focused only on functionality that is specific to the RDF schema and converting to/from it.

It seems that you're proposing to move media-processing specific code to the metis schema (e.g. extract the hasView, isShownAt, etc. from records). Reading back my TODO I believe my thoughts at the time was to transfer just the code that deserializes to a Document. The code specific to media processing could stay.

Specifically, in RdfDeserializer I marked the code that I think should stay in the media processing module. I think that RdfNamespaceContext, ResourceEntry, RdfResourceEntry, RdfXpathContstants, UrlType and perhaps other bits of code then also should stay in media processing?

Bits and pieces like XPathExpressionWrapper may perhaps stay and could then be used throughout the codebase? Or perhaps it should move to metis common instead?

What do you think?

Okay. As I mentioned earlier, the TODO comment was clear I just thought taking it a step further and move more code that could be potentially used by other projects. That being said I was a bit hesitant of this impactful change.
So that clears it I guess.
Thanks for having a look.
Closing this to rework on a smaller update and perhaps some renaming on the media library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants