-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broken web links #318
Comments
Starting to follow this up a bit. Some questions:
I see there are 2 options of interest, |
Dear Daniel @erget Thanks for following this up. Yes, we imported the mailman archive. Our copy is linked on the discussion page on the same line, and in front of, the UCAR original. We can fix those missing links by pointing to the appropriate place in our copy, but unfortunately this will take a bit of work because our copy is not grouped into years. I don't know why the link to the CEDA cf-checker isn't working, but @RosalynHatcher probably could advise. I don't know why the CEDA editor isn't working. It has worked in the past, although it's always rather slow to answer that query. Maybe Alison @japamment could comment? Best wishes for 2023 Jonathan |
I agree that the weekly repetition of the same broken links has been a nuisance and thanks for stopping it. But we mustn't forget to fix them! We should keep this issue open until it's done. |
@JonathanGregory I agree in principle. I've now fixed what I think is everything in #320 . What this does:
So I propose we merge that PR, close this issue, plus the other one related to link checking, and open one to get address the longer-term issues of migrating the mailman archive and figuring what's up with the domains that are timing out. What do you think? |
Never mind... Something's not right with how the job isn't failing when I expect it to. I'll need some help to finalise this. Will request that in the requisite PR. |
I've closed #330 because it was the same errors as this one. |
In issue 345, which is the most recent output of the cron job, @DocOtak wrote
I've been closing the new ones (like 345) every Monday morning, as a human cron daemon. I don't mind doing that, but equally I don't think it helps to have a new one every week until we've fixed the missing links identified by this edition. |
@JonathanGregory @DocOtak I agree, actually we could disable this until we get it fixed - we've made progress on it but slowly ;) @DocOtak do you have the rights to disable the cron task, and could it be executed manually in that case? |
As we have probably fixed all the recurrent broken links on the website, we don't need to disable the link-checker, as discussed in this issue. I will therefore close this issue, and we will see what the link-checker has to say when it next executes. |
@erget @JonathanGregory and @sadielbartholomew I have come back to the link checker errors and problems. Sorry if a miss some already open issue related. I will merge #320 .... but that doesn't fix all missing/timeout/vanished (broken) links. I'm preparing and PR to fix/improve the link checker and try silent some permanent broken links and old documents with wrong UTF-8 encoding where link checker fails with an error. |
Thanks for working on this, @cofinoa. |
For reference I am copying here various comments from #486 Dear @cf-convention/info-mgmt team, this PR relates to long-standing issue with the link checker, see: #318 #320 This is a first step to fix the issues with the link checker when PR are made. The action is triggered when PR are open/re-open and
Please take a moment to review and let me know if this fits. If so, I will continue with the PR to incorporate the link check of the site at regular basis (i.e. cron job every Monday), or just we can merge this PR and open a new one PR for the that. PS: annotation of the PR with a comment with the link check report it's a challenge due to security issues with PR from forks. If PR are from same repo (not) forks then PR and ISSUE commenting it's possible. PS2: Checking links to GITHUB may raise and issue with the limit rate of GITHUB HTTP requests PS: annotation of the PR with a comment with the link check report it's a challenge due to security issues with PR from forks. If PR are from same repo (not) forks then PR and ISSUE commenting it's possible. PS2: Checking links to GITHUB may raise and issue with the limit rate of GITHUB HTTP requests Hi Antonio,
I was recently adding some minor changes to this file, and noticed that there is actually very little markdown, and a lot of repetitive html links. I though that it maybe would be possible to generate this file dynamically during the build process. Something like a small [python] script looking for through the relevant @larsbarring I have made a new PR at #487 with your suggestion to refactor |
Thanks for #487, Antonio. I don't fully understand this. Is this problem to the link-checker caused by a link to a markdown page from HTML, which is itself wrapped up as a markdown page? This seems rather convoluted. If the whole page is put in markdown instead, does that resolve it? |
@larsbarring, as you have suggested at PR #486 , I have refactored Jekyll, it's quite limited to manage data and/or strings, and it's needed to create a Jekyll plugin to improve that, but my Ruby skills are also quite limited. From @larsbarring #487 (comment) The lists of links will be automatically generated from existing version subdirectories under the Hence pinging @japamment, @efisher008 |
It's a problem to link a .HTML page which is build from a .MD page. But, because the content it's HTML, we can no link to the .MD page, as we are doing in other .MD pages, with MD content. I have rewritten the HTML content of |
I have closed PR #487 because automatic generation of links to |
There are some temporary issues with some links, for example I have had to exclude: in |
@cofinoa OK, I agree that having a closer look at the directory structure under |
There was no new broken links report this morning, I am very pleased to see. Thanks for suppressing it, @cofinoa! Shall we close this issue, or is it still a work in progress? |
@JonathanGregory, It's still in progress. If, it's OK, I would like to merge PR #486, which it's an intermediate step, before to solve this issue. |
That's fine. Let's leave it open then. Thanks. |
I have created 2 workflows/actions:
The exclusion rules are at Currently, I have excluded the following URL: exclude = [
# Data/cf-standard-names/
"http://glossary.ametsoc.org/wiki",
"https://www.unidata.ucar.edu/software/udunits/udunits-current/doc/udunits",
"https://www.unidata.ucar.edu/software/udunits/udunits-2.2.28/udunits2.html",
"https://www.sciencedirect.com/science/article/pii/0967063793901018",
"https://www.ipcc.ch/ipccreports/tar/wg1/273.htm",
"http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata",
"http://gcmd.nasa.gov/Resources/valids",
#
"cfeditor.ceda.ac.uk", # standard_name_rule, vocabularies, discussion
"https://mailman.cgd.ucar.edu/pipermail/cf-metadata", # discussion, governance
"http://mmisw.org/ont", # faq (TIMEOUT)
"https://mmisw.org/ont", # faq (TIMEOUT)
"http://www.cgd.ucar.edu/cms/eaton/cf-metadata/clivar_article.pdf", # Data/cf-documents/cf-governance/cf2_whitepaper_final.html
"http://www.cgd.ucar.edu/cms/eaton/cf-metadata/CF-current.html", # Data/cf-documents/requirements-recommendations
"https://www.usbr.gov/lc/socal/reports/SMappend_C.pdf", # Data/area-type-table/**/build/area-type-table.html
"https://cf-trac.llnl.gov/trac/", # 2018-Workshop, 2019-Workshop
"http://mailman.cgd.ucar.edu/pipermail/cf-metadata", # 2019-Workshop
"https://www.wonder.me", # 2021-Workshop
"https://figshare.com/account/articles/24633939", # 2023-Workshop
"https://figshare.com/account/articles/24633894", # 2023-Workshop
] Some of the excluded URL, are spurious broken links, which are temporarily broken. Other, are permanently broken, and we need to decided what to do [1]. Also, I have excluded to check some paths, mainly because they contain some documents with invalid encoding, or many broken relatives links (i.e. Trac-tickets): exclude_path = [
"_site/Data/cf-standard-names/docs/guidelines.html",
"_site/Data/cf-conventions/",
"_site/Data/Trac-tickets/",
] regards [1] we could link to a capture from the Wayback Machine: |
I have improved the weekly cron workflow for the link checker ( You can see a sample at issue #493 |
That's a very useful improvement. Thanks, Antonio. |
@JonathanGregory et al. the issue #493 with broken link report has been updated, and new comment has been added to the issue for today's checker cron job: I have re-run the checker manually and "new" error appear, and the others disappear. The issue has been updated with the report for this "manual" check: IMO, there are 2 pending actions that we need to discuss:
It maybe would be useful to add this to the next meeting for the Information Management Team @cf-convention/info-mgmt |
I'm closing this to continue discussion at https://github.com/orgs/cf-convention/discussions/320 |
This issue was opened automatically but led to a discussion by humans
Errors were reported while checking the availability of links:
Issues found in 6 inputs. Find details below.
[faq.md]:
✗ [404] https://mailman.cgd.ucar.edu/pipermail/cf-metadata/2009/047768.html | Failed: Network error: Not Found
⧖ [TIMEOUT] http://coastwatch.pfeg.noaa.gov/erddap/convert/units.html | Timeout
✗ [ERR] http://kitt.llnl.gov/trac/wiki/SatelliteData | Failed: Network error: dns error: no record found for name: kitt.llnl.gov.coi3uxiffnlergb4vem53tdisf.gx.internal.cloudapp.net. type: AAAA class: IN
✗ [404] https://mailman.cgd.ucar.edu/pipermail/cf-metadata/2012/055875.html | Failed: Network error: Not Found
✗ [404] https://mailman.cgd.ucar.edu/pipermail/cf-metadata/2010/053657.html | Failed: Network error: Not Found
✗ [404] https://mailman.cgd.ucar.edu/pipermail/cf-metadata/2008/052705.html | Failed: Network error: Not Found
✗ [404] https://mailman.cgd.ucar.edu/pipermail/cf-metadata/2010/048064.html | Failed: Network error: Not Found
✗ [404] https://mailman.cgd.ucar.edu/pipermail/cf-metadata/2008/052334.html | Failed: Network error: Not Found
[standard_name_rules.md]:
⧖ [TIMEOUT] http://cfeditor.ceda.ac.uk/proposals/1?status=active&namefilter=&proposerfilter=&descfilter=&filter+and+display=filter | Timeout
[discussion.md]:
⧖ [TIMEOUT] http://cfeditor.ceda.ac.uk/proposals/1?status=active&namefilter=&proposerfilter=&descfilter=&filter+and+display=filter | Timeout
⧖ [TIMEOUT] http://cfeditor.ceda.ac.uk/proposals/1?status=inactive&namefilter=&proposerfilter=&descfilter=&filter+and+display=filter | Timeout
[software.md]:
✗ [ERR] http://wps-web1.ceda.ac.uk/submit/form?proc_id=CFChecker | Failed: Network error: dns error: no record found for name: wps-web1.ceda.ac.uk.coi3uxiffnlergb4vem53tdisf.gx.internal.cloudapp.net. type: AAAA class: IN
[vocabularies.md]:
⧖ [TIMEOUT] http://cfeditor.ceda.ac.uk/proposals/1?status=inactive&namefilter=&proposerfilter=&descfilter=&filter+and+display=filter | Timeout
⧖ [TIMEOUT] http://cfeditor.ceda.ac.uk/proposals/1?status=active&namefilter=&proposerfilter=&descfilter=&filter+and+display=filter | Timeout
[constitution.md]:
✗ [ERR] file:///github/workspace/(https:/github.com/cf-convention/cf-conventions/blob/master/CODE_OF_CONDUCT.md) | Failed: Cannot find file
🔍 350 Total ✅ 335 OK 🚫 9 Errors (HTTP:9|Timeouts:6)
The text was updated successfully, but these errors were encountered: