Begin using `curies.Converter` in more places #397

cthoyt · 2023-07-22T23:37:32Z

Part of #363

This PR does the following:

Adds a minimum version of curies that has the strict compress and expand functions
Rewrites the SPARQL utils and RDF utils to use curies functionality
Updates custom curie_from_uri to use curies (will make a follow-up PR that replaces this completely)

This PR automatically instantiates a `curies.Converter` for a given SPARQL configuration and re-implements the compress/expand operations using the `curies` package

matentzn

Looks awesome!! THANK YOU!

@hrshdhgd I will leave it to you to deal with the PR - I left 1/2 comments here and there to check, but nothing critical.

src/sssom/cli.py

src/sssom/sparql_util.py

src/sssom/util.py

cthoyt · 2023-07-24T13:13:07Z

@matentzn @hrshdhgd it appears the issues in the build now are due to poetry. I will again give a gentle nudge to totally drop poetry, it is way more of an issue than it helps

hrshdhgd · 2023-07-24T15:00:15Z

I ran this PR locally and there are 4 errors as of now. First one is a CLI error which will be fixed automatically once the following are fixed.

Three pytest errors occurring as of now:

FAILED tests/test_parsers.py::TestParse::test_parse_alignment_minidom - curies.api.DuplicateURIPrefixes: Duplicate URI prefixes:
FAILED tests/test_parsers.py::TestParse::test_parse_obographs - curies.api.DuplicateURIPrefixes: Duplicate URI prefixes:
FAILED tests/test_parsers.py::TestParse::test_parse_sssom_rdf - ValueError: CURIE appeared where there should be a URI, and could not be standardized: oio:hasBroadSynonym

From what I can tell the first two errors are thrown by curies/api.py.

The third error is because the prefix oio: is not in the prefix_map.

hrshdhgd · 2023-07-24T15:11:42Z

it appears the issues in the build now are due to poetry. I will again give a gentle nudge to totally drop poetry, it is way more of an issue than it helps

I agree but it seems to me that only GitHub environments seem to be problematic. Locally, it seems to work fine. Plus Chris insists all our projects following a standard architecture, hence my hands are sort of tied 🙂 .

cthoyt · 2023-07-24T15:24:47Z

@hrshdhgd thanks for following up on all of this. can you take care of looking into these tests that have inconsistent data? Great to see that we are implicitly now checking data integrity using curies!

src/sssom/context.py

src/sssom/sparql_util.py

src/sssom/context.py

tests/test_parsers.py

matentzn · 2023-07-25T14:15:38Z

tests/test_parsers.py

+        #     f"{self.obographs_file} has the wrong number of mappings.",
+        # )
+
+    @unittest.skip(reason="Not sure what is broken in this graph")


Sorry but no; what exactly is going on here?

looking into it

I've figured out what this was testing and written a more detailed explanation of what it was doing before and why we don't need it anymore.

The other option is we can make the OBO parser much less lenient, and then we will be throwing exceptions left and right (not recommended)

so yes, I have now added a detailed explanation of why this test is not needed. We should keep it skipped, or even better, remove it.

There are issues in mapping-commons/sssom-py#397 because the base schema does not have a valid prefix map - `dc` and `dcterms` are both used as prefixes for the same URI prefix. This happens because LinkML makes some inference about what needs to be there - there are some assorted uses throughout the schema definition with both DC and DCTERMS. This PR updates the schema to only use DCTERMS and puts explicit entries for DCTERMS in the prefix map to address this. It then regenerates the whole project.

This was possible thanks to mapping-commons/sssom#302

.gitignore

…1.3.2

cthoyt · 2023-07-25T19:47:17Z

@hrshdhgd it appears this PR is passing, even with the Poetry changes removed.

hrshdhgd · 2023-07-25T19:56:20Z

@hrshdhgd it appears this PR is passing, even with the Poetry changes removed.

Yes , I had a quick solution that worked! It won't sustain for future PRs if we're using the latest versions of poetry.

hrshdhgd · 2023-07-25T20:05:52Z

I merged #399 to main. Now if you merge the main branch down to this one, all should be well. I'll let you do it since you're actively working on this branch. This will eliminate the conflict.

matentzn

I have convinced myself that deleting that broken obographs test was the right thing to do! Thanks for your work on this, I am happy now; the one comment is purely an FYI, the behaviour is desireable.

matentzn · 2023-07-27T12:56:39Z

tests/test_parsers.py

@@ -117,21 +118,10 @@ def test_parse_obographs(self):
            write_table(msdf, file)
        self.assertEqual(
            len(msdf.df),
-            9881,
+            8099,


@cthoyt I reinstated this test. Analysing the warnings, I noticed that the 1800 less parsed entities (which is good, since they are not in the prefix map) are due to biopragmatics/bioregistry#917.

I think this is entirely independent of this PR, so I will approve this, this is just FYI.

matentzn · 2023-07-27T12:58:00Z

@hrshdhgd ready for your review and merge, I am happy with it!

hrshdhgd

Looks much cleaner than before! Thanks @cthoyt !

cthoyt added 7 commits July 22, 2023 18:19

Use curies in sparql_util.py

9ee0af2

This PR automatically instantiates a `curies.Converter` for a given SPARQL configuration and re-implements the compress/expand operations using the `curies` package

Update sparql_util.py

6bdf3b6

Additional cleanup of sparql endpoint

8ebea27

Additional updates to RDF

e3383a4

Fix bug where endpoint config is built up over time

e9f4747

Deprecate old compression function

da7ad44

Update pyproject.toml

483d10a

matentzn previously approved these changes Jul 24, 2023

View reviewed changes

Update util.py

7e19639

cthoyt dismissed matentzn’s stale review via 7e19639 July 24, 2023 12:45

cthoyt added 5 commits July 24, 2023 08:46

Update util.py

ec3452f

Add typing.deprecated to curie_from_uri

118017d

Update lock

1849d60

Add typing extensions

032be37

Update util.py

8868d08

hrshdhgd added 2 commits July 24, 2023 10:01

testing latest version

29c1802

anchor to 1.4.2 like other projects

fcc6a78

hrshdhgd added 2 commits July 24, 2023 10:13

using snok poetry from marketplace

2c75910

remove pip update

faeca94

hrshdhgd added 7 commits July 24, 2023 10:28

lock file updated

ebcab13

added --no-interaction

1cc5ea9

virtualenv causing the errors

13d9cad

anchor versions

ba21248

poetry == 1.4.2

651d278

remove poetry.lock from source control

6713ad0

remove poetry.lock from source control

f4f9106

cthoyt added 3 commits July 24, 2023 21:34

Update context.py

2610322

Update parsers.py

6abda6f

Update util.py

fe21fc7

matentzn reviewed Jul 25, 2023

View reviewed changes

src/sssom/context.py Outdated Show resolved Hide resolved

src/sssom/sparql_util.py Outdated Show resolved Hide resolved

cthoyt added 2 commits July 25, 2023 06:57

Clean DC and update tests

25d74ae

Update sparql_util.py

1c25c94

cthoyt mentioned this pull request Jul 25, 2023

Fix SSSOM Schema not having bijective prefix map mapping-commons/sssom#302

Merged

Update context.py

afe95a8

matentzn reviewed Jul 25, 2023

View reviewed changes

Add implicit prefix map validity checker

439f561

cthoyt added 3 commits July 25, 2023 10:55

Add text explanation.

f0cd478

Update test_parsers.py

a9b0f27

Remove DC cleanup

e24fec2

This was possible thanks to mapping-commons/sssom#302

cthoyt commented Jul 25, 2023

View reviewed changes

.gitignore Outdated Show resolved Hide resolved

removing poetry.lock from gitignore and commiting the lock file from …

e5afffb

…1.3.2

Update .gitignore

3ddf061

Merge branch 'master' into improve-sparql-util

db44005

cthoyt requested a review from matentzn July 26, 2023 15:04

Remove test_broken_obographs test are reinstating the correct one

36180e1

matentzn approved these changes Jul 27, 2023

View reviewed changes

cthoyt mentioned this pull request Jul 27, 2023

Any missing prefixes that could be standardised? biopragmatics/bioregistry#917

Closed

cthoyt requested a review from hrshdhgd July 27, 2023 13:32

hrshdhgd approved these changes Jul 27, 2023

View reviewed changes

hrshdhgd merged commit f3be757 into master Jul 27, 2023
6 checks passed

hrshdhgd deleted the improve-sparql-util branch July 27, 2023 13:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Begin using `curies.Converter` in more places #397

Begin using `curies.Converter` in more places #397

cthoyt commented Jul 22, 2023

matentzn left a comment

cthoyt commented Jul 24, 2023

hrshdhgd commented Jul 24, 2023 •

edited

Loading

hrshdhgd commented Jul 24, 2023

cthoyt commented Jul 24, 2023

matentzn Jul 25, 2023

cthoyt Jul 25, 2023

cthoyt Jul 25, 2023

cthoyt Jul 25, 2023

cthoyt commented Jul 25, 2023

hrshdhgd commented Jul 25, 2023 •

edited

Loading

hrshdhgd commented Jul 25, 2023 •

edited

Loading

matentzn left a comment

matentzn Jul 27, 2023 •

edited

Loading

matentzn commented Jul 27, 2023

hrshdhgd left a comment

Begin using curies.Converter in more places #397

Begin using curies.Converter in more places #397

Conversation

cthoyt commented Jul 22, 2023

matentzn left a comment

Choose a reason for hiding this comment

cthoyt commented Jul 24, 2023

hrshdhgd commented Jul 24, 2023 • edited Loading

hrshdhgd commented Jul 24, 2023

cthoyt commented Jul 24, 2023

matentzn Jul 25, 2023

Choose a reason for hiding this comment

cthoyt Jul 25, 2023

Choose a reason for hiding this comment

cthoyt Jul 25, 2023

Choose a reason for hiding this comment

cthoyt Jul 25, 2023

Choose a reason for hiding this comment

cthoyt commented Jul 25, 2023

hrshdhgd commented Jul 25, 2023 • edited Loading

hrshdhgd commented Jul 25, 2023 • edited Loading

matentzn left a comment

Choose a reason for hiding this comment

matentzn Jul 27, 2023 • edited Loading

Choose a reason for hiding this comment

matentzn commented Jul 27, 2023

hrshdhgd left a comment

Choose a reason for hiding this comment

Begin using `curies.Converter` in more places #397

Begin using `curies.Converter` in more places #397

hrshdhgd commented Jul 24, 2023 •

edited

Loading

hrshdhgd commented Jul 25, 2023 •

edited

Loading

hrshdhgd commented Jul 25, 2023 •

edited

Loading

matentzn Jul 27, 2023 •

edited

Loading