-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Turning rdflib jsonld into a "full processor" (a.o. for schema.org compliance) #2692
Comments
I'm looking into implementing this using RDFLib/rdflib-jsonld#63 as a starting point. In that code HTML parsing is attempted after JSON parsing fails, but I'm looking at choosing the parse using the content type of the source. I see that the the JSON-LD test suite in "test/jsonld/1.1" already provides a number of tests which just need to be enabled. |
FYI: I'm occasionally working on this at https://github.com/wallberg-umd/rdflib/tree/issue-2692-embedded-jsonld-draft . I've added the basic functionality and enabled the existing tests. I'm now working through making the tests pass. |
See https://w3c.github.io/json-ld-syntax/#embedding-json-ld-in-html-documents and https://www.w3.org/TR/json-ld11-api/#html-content-algorithms . Implementation summary: rdflib.plugins.parsers.jsonld.JsonLDParser.parse * add docstring * change parameter list from **kwargs to explicit list * add optional extract_all_scripts parameter * get the fragment identifier from source.getSystemId() * add fragment_id and extract_all_scripts parameters to the call to source_to_json rdflib.plugins.shared.jsonld.util.source_to_json * add docstring * add optional fragment_id and extract_all_scripts parameters * change the return value to a tuple with the extracted JSON document and value of the HTML base element * if source.content_type is "text/html" or "application/xhtml+xml" then parse source as HTML and extract the appropriate script element(s) and the HTML base element Testing test/jsonld/test_onedotone.py * enable all existing html tests (except html/f004-in) * if inputpath ends with ".html" (with optional fragment identifier) then invoke runner.do_test_html For more information on the failing html/f004-in test, see https://lists.w3.org/Archives/Public/public-json-ld-wg/2024May/0000.html . test/jsonld/runner.py * add new do_test_html function Note that the html test cases from the JSON-LD Test Suite combine testing for JSON-LD extraction from the HTML with testing for other algorithms (e.g. compact/flatten), which rdflib does not currently support. In order to test extraction only and ignore the compact/flatten algorithms, do_test_html performs a graph comparison using rdflib.compare.isomorphic, without serializing back to JSON .
I've completed an initial implementation for this issue, see https://github.com/wallberg/rdflib/tree/issue-2692-embedded-jsonld . It contains one breaking change: when I can think of other ways to return the base without breaking the current return value:
I'd like to get some feedback on the preferred approach before submitting the PR. A note on the current status of validation:
|
See https://w3c.github.io/json-ld-syntax/#embedding-json-ld-in-html-documents and https://www.w3.org/TR/json-ld11-api/#html-content-algorithms . Implementation summary: rdflib.plugins.parsers.jsonld.JsonLDParser.parse * add docstring * change parameter list from **kwargs to explicit list * add optional extract_all_scripts parameter * get the fragment identifier from source.getSystemId() * add fragment_id and extract_all_scripts parameters to the call to source_to_json rdflib.plugins.shared.jsonld.util.source_to_json * add docstring * add optional fragment_id and extract_all_scripts parameters * change the return value to a tuple with the extracted JSON document and value of the HTML base element * if source.content_type is "text/html" or "application/xhtml+xml" then parse source as HTML and extract the appropriate script element(s) and the HTML base element Testing test/jsonld/test_onedotone.py * enable all existing html tests (except html/f004-in) * if inputpath ends with ".html" (with optional fragment identifier) then invoke runner.do_test_html For more information on the failing html/f004-in test, see https://lists.w3.org/Archives/Public/public-json-ld-wg/2024May/0000.html . test/jsonld/runner.py * add new do_test_html function Note that the html test cases from the JSON-LD Test Suite combine testing for JSON-LD extraction from the HTML with testing for other algorithms (e.g. compact/flatten), which rdflib does not currently support. In order to test extraction only and ignore the compact/flatten algorithms, do_test_html performs a graph comparison using rdflib.compare.isomorphic, without serializing back to JSON . Co-authored-by: Ashley Sommer <[email protected]> Co-authored-by: Nicholas Car <[email protected]>
Cloned from RDFLib/rdflib-jsonld#62 :
The JSON-LD 1.1 draft spec mentions different levels of processing for JSON-LD: https://w3c.github.io/json-ld-syntax/#processor-levels
A pure processor can only parse JSON-LD expressed in JSON directly, but a full processor can also parse JSON-LD embedded in HTML.
It would be great if rdflib-jsonld would support this. It would make rdflib-jsonld a library that could be used for HTML documents following the schema.org guidelines for embedding (meta)data in HTML pages as described in their getting started guide https://schema.org/docs/gs.html.
Together with the RDFa & microdata parsers this can then work as a fully RDF based version of the Structured Data Testing tool from Google: https://search.google.com/structured-data/testing-tool.
The text was updated successfully, but these errors were encountered: