-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aggregated TimeMap summary #97
Comments
Why JSON? Also mind the verbiage w/ "memento count" vs "URI-M count" per https://arxiv.org/abs/1703.03302 |
JSON will be directly consumable by many visualization libraries in JS or other languages. As far as "memento count" is concerned, if "URI-M count" is preferred then we might need to update the |
Here is a sample output draft. {
"original_uri": "http://example.org/index.html",
"total_mementos": 54,
"archives": {
"web.archive.org": {
"count": 53,
"first": {
"datetime": "2002-10-16T10:13:37Z",
"uri": "http://web.archive.org/web/20021016101337/http://example.org/index.html"
},
"last": {
"datetime": "2016-04-10T22:12:45Z",
"uri": "http://web.archive.org/web/20160410221245/http://example.org/index.html"
}
},
"archive.is": {
"count": 1,
"first": {
"datetime": "2013-09-16T08:37:01Z",
"uri": "http://archive.is/20130916083701/http://example.org/index.html"
},
"last": {
"datetime": "2013-09-16T08:37:01Z",
"uri": "http://archive.is/20130916083701/http://example.org/index.html"
}
},
"webarchive.org.uk": {
"count": 0
}
},
"periods": {
"2002": {
"10": 10,
"12": 6
},
"2003": {
"01": 1,
"02": 3,
"05": 2,
"09": 1,
"11": 4
},
"2005": {
"02": 3,
"04": 7,
"05": 2,
"08": 5
},
"2013": {
"07": 1,
"09": 3
},
"2016": {
"02": 5,
"19": 1
}
}
} |
@ibnesayeed |
Thanks @machawk1, the point is taken. We can perhaps make it more coherent across the board. However, the primary goal of this sample output was to communicate the intended implementation to collect ideas of what other information can be provided to aid tools as well as what tools can be built if such information is available. |
@ibnesayeed Right. I just wanted to encourage consistency. The temporal breakdown you have will be really useful. What are your thoughts on having that same sort of breakdown (optionally, additionally, and/or in lieu of the inter-archive) on a per-archive basis? |
That is certainly doable if it seems useful for some applications/visualizations. However, it would increase the size of the response. One might also think about the possibility of breaking down data on archives within each monthly period too. So, I think we should structure it in a way that future extensions don't break the current structure while being able to add more fine-grained breakdown information. Additionally, breakdown on |
Adding |
It sure does, but the number of items are capped to a max of
We can make a dedicated endpoint that accepts various parameters to let the client pick and choose what it wants. However, that would increase the complexity of the code (difficult to explain and maintain) and yield a confusing API documentation. This is perhaps the perfect opportunity to introduce GraphQL in MemGator, but I would hold on to it, because, it would require some serious planning to see what other endpoints can go that route. For now, this endpoint should give enough high level summary of a TimeMap that can help various visualization and archival exploration applications. The choice of |
I made some headway on this issue, see the issue-97 branch. The current output on that branch yields something like: {
"original_uri": "http://matkelly.com",
"archives": {
"web.archive.org":{
"count": 208,
"first":{
"datetime": 20060514123511,
"uri": "https://web.archive.org/web/20060514123511/http://www.matkelly.com:80/",
}
"last":{
"datetime": 20240413142440,
"uri": "https://web.archive.org/web/20240413142440/https://matkelly.com/",
}
},
"archive.md":{
"count": 18,
"first":{
"datetime": 20130618191814,
"uri": "http://archive.md/20130618191814/http://matkelly.com/",
}
"last":{
"datetime": 20210406203127,
"uri": "http://archive.md/20210406203127/https://matkelly.com/",
}
},
"wayback.archive-it.org":{
"count": 3,
"first":{
"datetime": 20140210154006,
"uri": "https://wayback.archive-it.org/all/20140210154006/http://matkelly.com/",
}
"last":{
"datetime": 20160805024730,
"uri": "https://wayback.archive-it.org/all/20160805024730/http://matkelly.com/",
}
},
"arquivo.pt":{
"count": 11,
"first":{
"datetime": 20200218230719,
"uri": "https://arquivo.pt/wayback/20200218230719mp_/https://matkelly.com/",
}
"last":{
"datetime": 20230121055854,
"uri": "https://arquivo.pt/wayback/20230121055854mp_/http://matkelly.com/",
}
},
"total_mementos": 240 The temporal breakdown still needs to be done and there are likely some formatting issues and code cleanup to do. Task:
|
@ibnesayeed Also suggested to add entries for archives that report zero mementos for the URI-R. |
Also, change |
We need an API endpoint that provides a summary of the aggregated TimeMap, preferably in JSON format. The summary can group memento counts for each upstream archive, and also a nested distribution on year and month levels.
The text was updated successfully, but these errors were encountered: