Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional hints in sitemaps to support efficient harvesting #200

Open
datadavev opened this issue Jan 30, 2022 · 0 comments
Open

Additional hints in sitemaps to support efficient harvesting #200

datadavev opened this issue Jan 30, 2022 · 0 comments

Comments

@datadavev
Copy link
Collaborator

Some collections may have large numbers of records describing different kinds of information (e.g. Datasets, Awards, and People) that may each have landing pages, and each landing page may have an entry in the sitemap.

An indexer only interested in Datasets would need to inspect all entries advertised in the sitemap to find Dataset entries, which can be inefficient and a needless use of resources.

Sitemaps are extensible, and one option may be to provide type hints in the <url> section of the sitemap. For example:

<?xml version="1.0" encoding="UTF-8"?>
<urlset 
    xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" 
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>
  <url>
    <loc>https://arcticdata.io/catalog/view/doi%3A10.18739%2FA2ST7DZ2Q</loc>
    <lastmod>2021-12-07T12:15:05Z</lastmod>
    <rdf:type>http://schema.org/Dataset</rdf:type>
  </url>
</urlset>

An obvious challenge is that many types may be expressed in a single landing page, and so which should be specified in the hint? This would be up to the provider, if there is a clear intention of presenting a specific type in the referenced <loc>, then a hint can be provided, and such hints may be used by a consumer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant