Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected results with multiple optionals at the same level #21

Open
kirahowe opened this issue Mar 31, 2020 · 3 comments
Open

Unexpected results with multiple optionals at the same level #21

kirahowe opened this issue Mar 31, 2020 · 3 comments

Comments

@kirahowe
Copy link

It seems like having multiple optionals in a matcha query returns unexpected results. E.g. in this example (from an internal app), the query returns different results depending on which optional comes first:

> (matcha/construct-1 {:notation ?notation
                       :name ?name
                       :company-number ?company_number
                       :company-uri ?company_uri
                       :expiry-date ?expiry_date}
                      [[uri skos:notation ?notation]
                       [uri reg:holder ?holder]
                       [?holder foaf:name ?name]
                       (matcha/optional [[?holder org:identifier ?company_number]
                                         [?holder owl:sameAs ?company_uri]])
                       (matcha/optional [[uri reg:expiryDate ?expiry_date]])]
                      matcha-db)

> {:notation "CBDU195810",
   :name "1ST WASTE MANAGEMENT CONSULTANTS LTD",
   :company-number "03736942",
   :company-uri #object[java.net.URI 0x2cbb354 "http://business.data.gov.uk/id/company/03736942"],
   :expiry-date _0}

vs

> (matcha/construct-1 {:notation ?notation
                       :name ?name
                       :company-number ?company_number
                       :company-uri ?company_uri
                       :expiry-date ?expiry_date}
                      [[uri skos:notation ?notation]
                       [uri reg:holder ?holder]
                       [?holder foaf:name ?name]
                       (matcha/optional [[uri reg:expiryDate ?expiry_date]])
                       (matcha/optional [[?holder org:identifier ?company_number]
                                         [?holder owl:sameAs ?company_uri]])]
                      matcha-db)

> {:notation "CBDU195810",
   :name "1ST WASTE MANAGEMENT CONSULTANTS LTD",
   :company-number _0,
   :company-uri _1,
   :expiry-date #object[java.time.LocalDate 0x2802054f "2020-09-20"]}

Not sure what the problem is, but this result was surprising.

@RickMoynihan
Copy link
Member

Rewriting this as nested optionals should work (nested optionals is explicitly supported like in SPARQL):

(matcha/construct-1 {:notation ?notation
                       :name ?name
                       :company-number ?company_number
                       :company-uri ?company_uri
                       :expiry-date ?expiry_date}
                      [[uri skos:notation ?notation]
                       [uri reg:holder ?holder]
                       [?holder foaf:name ?name]
                       (matcha/optional [[?holder org:identifier ?company_number]
                                         [?holder owl:sameAs ?company_uri]
                                         (matcha/optional [[uri reg:expiryDate ?expiry_date]])])]
                      matcha-db))

It looks like there may be a problem with your initial query though, probably when we parse the query forms for optionals, and lift their variables out a level. We might not be expecting two or more optionals at the same level, and I think we should.

I suspect they might be stomping on each other; but will need to view the query macro expansions.

@danmidwood
Copy link

This is still a problem on 0.1.12

Some data

@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix drl: <http://environment.data.gov.uk/asset-management/def/drl/> .
@prefix drl-aims: <http://environment.data.gov.uk/asset-management/def/drl/aims/> .
@prefix drl-edit: <http://environment.data.gov.uk/asset-management/def/drl/edit/> .
@prefix drl-edit-type: <http://environment.data.gov.uk/asset-management/def/drl/edit/type/> .
@prefix drl-id: <http://environment.data.gov.uk/asset-management/id/drl/> .


drl-id:Beacon-optional-fields a drl:Attribute;
  drl:attributeOf drl-id:Beacon;
  drl:delivery "fff";
  drl:format drl-id:format-8978f27f-d038-4b51-a49c-233628c1881e;
  drl:lastEdit drl-edit:f1de288f-3943-4b88-bc94-b76169462a8f;
  drl:mandatory false;
  drl:mapping "fff";
  dcterms:description "Optional Fields"@en;
  rdfs:label "Optional Fields"@en .

drl-edit:f1de288f-3943-4b88-bc94-b76169462a8f drl-edit:type drl-edit-type:created .

drl-id:format-8978f27f-d038-4b51-a49c-233628c1881e a drl:ReferenceFormat;
  drl:referenceCode "fff";
  drl:referenceType "fff";
  rdfs:label "Reference" .

drl-id:Beacon-picklist-attribute a drl:Attribute;
  drl:attributeOf drl-id:Beacon;
  drl:delivery "11";
  drl:format drl-id:AssetCategoryLand;
  drl:lastEdit drl-edit:c1183385-ea7a-45b5-a800-f6406267cd50;
  drl:mandatory false;
  drl:mapping "111";
  drl:purpose "11";
  dcterms:description "asdf"@en;
  rdfs:label "Picklist Attribute"@en .

drl-edit:c1183385-ea7a-45b5-a800-f6406267cd50 drl-edit:type drl-edit-type:created .

There are two Attributes in the data, and there are three optional things.

  • Mandatory present in both of these things
  • Purpose is present in just one
  • Edit date is not present in either

I've tried building with non-nested and nested optionals.

Not Nested

It looks like build use the first matching optional,

Putting mandatory first excludes the purpose on the item that should have it

(matcha/build [:id ?id]
              {:mandatory        ?mandatory
               :purpose          ?purpose
               :edit-date        ?editDate}
              [[?id rdf:a drldef:Attribute]
               [?id drldef:last-edit ?edit]
               [?edit drl-edit:type ?editType]
               (matcha/optional
                 [[?id drldef:mandatory ?mandatory]])
               (matcha/optional
                 [[?id drldef:purpose ?purpose]])
               (matcha/optional
                 [[?edit drl-edit:date ?editDate]])]
              db)

=> 
({:id
  #object[java.net.URI 0x2aee0e8b "http://environment.data.gov.uk/asset-management/id/drl/Beacon-picklist-attribute"],
  :mandatory false}
 {:id
  #object[java.net.URI 0x69843f6a "http://environment.data.gov.uk/asset-management/id/drl/Beacon-optional-fields"],
  :mandatory false})

Swapping purpose and mandatory returns the same item with it's purpose but without the mandatory

(matcha/build [:id ?id]
              {:mandatory        ?mandatory
               :purpose          ?purpose
               :edit-date        ?editDate}
              [[?id rdf:a drldef:Attribute]
               [?id drldef:last-edit ?edit]
               [?edit drl-edit:type ?editType]
               (matcha/optional
                 [[?id drldef:purpose ?purpose]])
               (matcha/optional
                 [[?id drldef:mandatory ?mandatory]])
               (matcha/optional
                 [[?edit drl-edit:date ?editDate]])]
              db)

=>
({:id
  #object[java.net.URI 0x2aee0e8b "http://environment.data.gov.uk/asset-management/id/drl/Beacon-picklist-attribute"],
  :purpose "11"}
 {:id
  #object[java.net.URI 0x69843f6a "http://environment.data.gov.uk/asset-management/id/drl/Beacon-optional-fields"],
  :mandatory false})

If I put the date optional first it matches the next one whichever it is following the same pattern as above.

Nested Optionals

Nested it seems to short circuit out when the first one in the vector does not match

(matcha/build [:id ?id]
              {:mandatory        ?mandatory
               :purpose          ?purpose
               :edit-date        ?editDate}
              [[?id rdf:a drldef:Attribute]
               [?id drldef:last-edit ?edit]
               [?edit drl-edit:type ?editType]
               (matcha/optional
                 [[?edit drl-edit:date ?editDate]
                  (matcha/optional
                    [[?id drldef:mandatory ?mandatory]
                     (matcha/optional
                       [[?id drldef:purpose ?purpose]])])])]
              db)

=> 
({:id
  #object[java.net.URI 0x2aee0e8b "http://environment.data.gov.uk/asset-management/id/drl/Beacon-picklist-attribute"]}
 {:id
  #object[java.net.URI 0x69843f6a "http://environment.data.gov.uk/asset-management/id/drl/Beacon-optional-fields"]})
(matcha/build [:id ?id]
              {:mandatory        ?mandatory
               :purpose          ?purpose
               :edit-date        ?editDate}
              [[?id rdf:a drldef:Attribute]
               [?id drldef:last-edit ?edit]
               [?edit drl-edit:type ?editType]
               (matcha/optional
                 [[?id drldef:purpose ?purpose]
                  (matcha/optional [[?id drldef:mandatory ?mandatory]])])]
              db)

=> 
({:id
  #object[java.net.URI 0x2aee0e8b "http://environment.data.gov.uk/asset-management/id/drl/Beacon-picklist-attribute"],
  :mandatory false,
  :purpose "11"}
 {:id
  #object[java.net.URI 0x69843f6a "http://environment.data.gov.uk/asset-management/id/drl/Beacon-optional-fields"]})

I tried nesting deeper and deeper and all sorts of different permutations without any success.

Got it!

Somewhat.

I am able to achieve to match correctly with a non-nested series of all permutation starting from the biggest and working down. This is unwieldy with my three optionals but works

(matcha/build [:id ?id]
              {:mandatory        ?mandatory
               :purpose          ?purpose
               :edit-date        ?editDate}
              [[?id rdf:a drldef:Attribute]
               [?id drldef:last-edit ?edit]
               [?edit drl-edit:type ?editType]
               (matcha/optional [[?id drldef:mandatory ?mandatory]
                                 [?id drldef:purpose ?purpose]
                                 [?edit drl-edit:date ?editDate]])

               (matcha/optional [[?id drldef:mandatory ?mandatory]
                                 [?id drldef:purpose ?purpose]])
               (matcha/optional [[?id drldef:mandatory ?mandatory]
                                 [?edit drl-edit:date ?editDate]])
               (matcha/optional [[?id drldef:purpose ?purpose]
                                 [?edit drl-edit:date ?editDate]])

               (matcha/optional [[?id drldef:mandatory ?mandatory]])
               (matcha/optional [[?id drldef:purpose ?purpose]])
               (matcha/optional [[?edit drl-edit:date ?editDate]])]
              db)

=> 
({:id
  #object[java.net.URI 0x2aee0e8b "http://environment.data.gov.uk/asset-management/id/drl/Beacon-picklist-attribute"],
  :mandatory false,
  :purpose "11"}
 {:id
  #object[java.net.URI 0x69843f6a "http://environment.data.gov.uk/asset-management/id/drl/Beacon-optional-fields"],
  :mandatory false})

@RickMoynihan
Copy link
Member

The final bit about permutations there is interesting. I think I know roughly what causes it, but it'll need some time to investigate how exactly to fix it.

Essentially I think the problem is that the optionals interfer with each other (making them not optional to each other).

For example:

(grafter.matcha.alpha/build [:id ?id] [[?id :p ?o] (grafter.matcha.alpha/optional [[?id :p2 ?o2]])])

Expands correctly to:

(->>
  (do
    (seq
      (clojure.core.logic.pldb/with-db
        (grafter.matcha.alpha/index-if-necessary
          db-or-idx__50404__auto__)
        (clojure.core.logic/run*
          [?id]
          (clojure.core.logic/fresh
            [?o ?o2]
            (grafter.matcha.alpha/triple ?id :p ?o)
            (clojure.core.logic/conda
              [(clojure.core.logic/conde
                 [(grafter.matcha.alpha/triple ?id :p2 ?o2)])]
              [clojure.core.logic/succeed]))))))
  (grafter.matcha.alpha/unify-solutions '[?id])
  (grafter.matcha.alpha/replace-vars-with-vals [:id '?id])
  (grafter.matcha.alpha/group-subjects)
  seq)

But this:

(grafter.matcha.alpha/construct [:id ?id]
   [[?id :p ?o] 
(grafter.matcha.alpha/optional [[?id :p2 ?o2]])
(grafter.matcha.alpha/optional [[?id :p3 ?o2]])])

Expands to this:

(->>
  (do
    (seq
      (clojure.core.logic.pldb/with-db
        (grafter.matcha.alpha/index-if-necessary
          db-or-idx__50404__auto__)
        (clojure.core.logic/run*
          [?id]
          (clojure.core.logic/fresh
            [?o ?o2]
            (grafter.matcha.alpha/triple ?id :p ?o)
            (clojure.core.logic/conda
              [(clojure.core.logic/conde
                 [(grafter.matcha.alpha/triple ?id :p2 ?o2)])]
              [(clojure.core.logic/conde
                 [(grafter.matcha.alpha/triple ?id :p3 ?o2)])]
              [clojure.core.logic/succeed]))))))
  (grafter.matcha.alpha/unify-solutions '[?id])
  (grafter.matcha.alpha/replace-vars-with-vals [:id '?id])
  (grafter.matcha.alpha/group-subjects)
  seq)

I think that fresh o2 should be lexically scoped over each clause of the conde. This is basically what I meant by the comment:

probably when we parse the query forms for optionals, and lift their variables out a level. We might not be expecting two or more optionals at the same level, and I think we should.
I suspect they might be stomping on each other;

So I suspect it needs to be something like:

(->>
  (do
    (seq
      (clojure.core.logic.pldb/with-db
        (grafter.matcha.alpha/index-if-necessary
          db-or-idx__50404__auto__)
        (clojure.core.logic/run*
          [?id]
          (clojure.core.logic/fresh
          [?o]
            (grafter.matcha.alpha/triple ?id :p ?o)
            (clojure.core.logic/conda
              [(clojure.core.logic/conde
              [(clojure.core.logic/fresh [?o2] (grafter.matcha.alpha/triple ?id :p2 ?o2))])]
              [(clojure.core.logic/fresh [?o2] (clojure.core.logic/conde [(grafter.matcha.alpha/triple ?id :p3 ?o2)]))]
              [clojure.core.logic/succeed]))))))
  (grafter.matcha.alpha/unify-solutions '[?id])
  (grafter.matcha.alpha/replace-vars-with-vals [:id '?id])
  (grafter.matcha.alpha/group-subjects)
  seq)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants