Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rdflib.Literal handling of timezone aware datetime objects #2014

Open
SupImDos opened this issue Jul 13, 2022 · 3 comments
Open

rdflib.Literal handling of timezone aware datetime objects #2014

SupImDos opened this issue Jul 13, 2022 · 3 comments

Comments

@SupImDos
Copy link

Hi,

Currently, timezone aware and unaware datetimes are both assigned the datatype xsd:dateTime by default if no datatype is specified.
(See: https://github.com/RDFLib/rdflib/blob/6.1.1/rdflib/term.py#L1579)

My question is, would it be more appropriate default behaviour for an unaware datetime to be assigned xsd:dateTime and aware datetimes to be assigned xsd:dateTimeStamp?

Example

import datetime
import rdflib

# Datetimes
unaware = datetime.datetime.now()
aware = datetime.datetime.now().replace(tzinfo=datetime.timezone.utc)

# Literals
a = rdflib.Literal(unaware)
b = rdflib.Literal(aware)

# Current Behaviour
a
# rdflib.term.Literal('2022-07-13T14:29:05.282537', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#dateTime'))
b
# rdflib.term.Literal('2022-07-13T14:29:07.660420+00:00', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#dateTime'))

# Desired Behaviour
a
# rdflib.term.Literal('2022-07-13T14:29:05.282537', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#dateTime'))
b
# rdflib.term.Literal('2022-07-13T14:29:07.660420+00:00', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#dateTimeStamp'))

Thanks!

@ghost
Copy link

ghost commented Jul 13, 2022

would it be more appropriate default behaviour for an unaware datetime to be assigned xsd:dateTime and aware datetimes to be assigned xsd:dateTimeStamp?

Not an unreasonable question at all but any implementation is going to be hampered by the fact that Python doesn't provide a datetimestamp object which would fit readily with the straightforward Python type -> XSD type mapping currently implemented in https://github.com/RDFLib/rdflib/blob/6.1.1/rdflib/term.py#L1579. As it stands, to handle datetimestamp by default would necessitate that approach having to be (fairly comprehensively) modified to take account of datetime + tz being a special case (yuk). TBH, I can't see that happening anytime soon, if at all --- but if you're sufficently motivated to contribute a PR, that would be welcome.

There's another factor tilting the balance in favour of having to explicitly specify a datetimestamp datatype for a Literal - as Andy Seaborne notes wrt to a SO query on XSD.dateTimeStamp in SPARQL queries:

It will depend on whether engine supports that datatype. Some do, some don't. It is not required by the SPARQL spec. xsd:dateTime is required.

Having grepped the RDFLib SPARQL source for dateTime, I think I can safely say that the RDFLib SPARQL implementation doesn't support the XSD.dateTimeStamp datatype, so one would need cast dateTimeStamp as dateTime¹ because otherwise, queries could mysteriously return zero results.

¹ AndyS recommends strdt(str("2022-03-11T21:01:23Z"^^xsd:dateTimeStamp), xsd:dateTime)

I'll leave this as an issue for a little while before converting it to a discussion entry.

@SupImDos
Copy link
Author

Thanks very much for the prompt reply!

Not an unreasonable question at all but any implementation is going to be hampered by the fact that Python doesn't provide a datetimestamp object which would fit readily with the straightforward Python type -> XSD type mapping currently implemented in https://github.com/RDFLib/rdflib/blob/6.1.1/rdflib/term.py#L1579. As it stands, to handle datetimestamp by default would necessitate that approach having to be (fairly comprehensively) modified to take account of datetime + tz being a special case (yuk). TBH, I can't see that happening anytime soon, if at all --- but if you're sufficently motivated to contribute a PR, that would be welcome.

I agree, and I thought about this when looking through the source code of rdflib/term.py initially. One option I did come up with (which may not necessitate any refactoring) was creating a mock datetimestamp class for the isinstance(...) check as follows:

Example datetimestamp Class for isinstance(...) Checks

from datetime import datetime, timezone

# Mock timezone aware datetimestamp class for `isinstance(...)` checking
class datetimestampmeta(type):
    def __instancecheck__(self, instance):
        return isinstance(instance, datetime) and instance.tzinfo is not None

class datetimestamp(datetime, metaclass=datetimestampmeta):
    ...

# Datetimes
unaware = datetime.now()
aware = datetime.now().replace(tzinfo=timezone.utc)

# Check
isinstance(unaware, datetimestamp)
# False
isinstance(aware, datetimestamp)
# True

There's another factor tilting the balance in favour of having to explicitly specify a datetimestamp datatype for a Literal - as Andy Seaborne notes wrt to a SO query on XSD.dateTimeStamp in SPARQL queries:

It will depend on whether engine supports that datatype. Some do, some don't. It is not required by the SPARQL spec. xsd:dateTime is required.

Having grepped the RDFLib SPARQL source for dateTime, I think I can safely say that the RDFLib SPARQL implementation doesn't support the XSD.dateTimeStamp datatype, so one would need cast dateTimeStamp as dateTime¹ because otherwise, queries could mysteriously return zero results.

Thanks! This is the level of insight I was hoping for, it would appear at least for now that the default behaviour is the most safe option.

@ajnelson-nist
Copy link
Contributor

I'd like to hop in1 with another perspective favoring implementation rather than waiting.

Some applications are picky with xsd:dateTime vs xsd:dateTimeStamp.

On a purely mechanical matter, SHACL constraints (particularly sh:datatype) and OWL rdfs:range definitions care about which of the two is specified. They are different IRIs, so using one when the other is specified in the shapes graph or TBox raises a data error in SHACL and an inconsistency in OWL.

The mechanical matter could end up being significant to more users soon. The current Candidate Recommendation Draft of OWL-Time deprecates time:inXSDDateTime to instead favor time:inXSDDateTimeStamp, so some applications will be exercising xsd:dateTimeStamp more. This is known to be an issue for users who plan to use TIME and PROV-O.

On a design matter, there are reasons to say why one might be used situationally versus the other. Some timelining applications have to start from timezoneless data, such as FAT file systems' timestamps, or database columns where the timezone was forgotten and then the database was migrated to a new geographic region. In these cases, getting to use xsd:dateTimeStamp for literals is the goal.

On a bug matter, I just hit an unexpected behavior, which I think lines up with @gjhiggins 's report on current implementation status.

>>> import rdflib
>>> x = rdflib.Literal("2023-01-01T01:23:45", datatype=rdflib.XSD.dateTime)
>>> x
rdflib.term.Literal('2023-01-01T01:23:45', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#dateTime'))
>>> x.toPython()
datetime.datetime(2023, 1, 1, 1, 23, 45)
>>> y = rdflib.Literal("2023-01-01T01:23:45", datatype=rdflib.XSD.dateTimeStamp)
>>> y
rdflib.term.Literal('2023-01-01T01:23:45', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#dateTimeStamp'))
>>> y.toPython()
rdflib.term.Literal('2023-01-01T01:23:45', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#dateTimeStamp'))

Two issues:

  1. I thought the dateTimeStamp literal y would balk at receiving a timezoneless timestamp. This thread so far indicates it didn't because it's not implemented yet.
  2. I thought y.toPython() would yield a not-quite-right datetime.datetime object, but instead it seemed to just return itself.

So, I think support for this datatype is worth pursuit.

Footnotes

  1. Disclaimer: Participation by NIST in the creation of the documentation of mentioned software is not intended to imply a recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that any specific software is necessarily the best available for the purpose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants