-
-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] docs: Fix search engine ranking of manual pages (SEO) #4579
Comments
This is a great project and great ideas! I don't know if Python docs are the best example of SEO, when I'm looking for Python docs on a topic/function/class it happens quite often that I don't find a link to the Python docs before the 2nd page, but probably all the results first try really hard with their SEO. That said, the plan doesn't seem wrong. Would having a version selector in the red box (not exhaustive, just like Python docs) help to have an entry point between versions while still keeping the canonical refs? Does some guidance exist on how read the docs work for that? These sites seem to work well with that. |
I am glad to see this proposal. Having access to links to manual pages that use the words "stable" and "devel", mirroring the current stable and development versions respectively, is also a huge bonus when writing tutorials and presentations. This makes the links and documents more durable/valid in time. At the same time keeping the links with version numbers enables the user to point to a specific version of a command if necessary. |
Seems the new concepts is slowly gaining traction: GRASS GIS 8.4 is the first hit again. https://www.google.com/search?q=grass+r.watershed It will take more time though, to see an improved ranking. |
I assume that it takes more days (weeks) to also see the updated titles, i.e. all in caps. The unexpected lowercase will still originate from the very old manual pages being first rank previously. |
I got automated feedback from the Google Search Console:
I suggest to modify the approach outlined above to:
Thus
Re: version selector in the old manual pages: why not - code contributions are welcome. |
It is a PITA. |
Sidenote: the missing acceptance of OSGeo/grass-addons#1215 blocks my unsubmitted SEO efforts in the cronjobs. |
Ok, I have injected the "canonical" in all old versions, restored the file timestamps accordingly and let re-index some pages in the Google search console. Slowly getting there: It will take 1-2 weeks in my experience to propagate to the user side of Google Search, i.e. by end of Nov 2024. |
The GRASS GIS manual pages of the different versions have been published for a long time with a difficult to understand concept of being invisible, redirected or shown, which also strongly affects the search engine ranking. SEO: Without indication of "canonical" URLs different versions wipe each out out in search engines. Canonical tags help consolidate duplicate or similar content by specifying the preferred version of a page, ensuring search engines index and rank the desired URL while avoiding duplicate content issues. This PR changes the cronjob scripts to - inject "grass-stable" as the "canonical" into older manual pages under versioned URL - inject "grass-devel" as the "canonical" into the development manual pages under versioned URL Like this no "duplicate content" from a SEO perspective should occur. Also `robots.txt` is updated to reactivate the manual pages of old GRASS GIS versions (which now contain "grass-stable" as the canonical). Fixes OSGeo/grass#4579
I have updated the PR description: now https://grass.osgeo.org/grass-stable/manuals/ is the overall main manual the older and "devel" manual versions point to via "canonical". Like this no more duplicate content should occur. See also OSGeo/grass-addons#1241 |
Different kind of issue: In the see more link, it says that the website prevented Google from creating a description, but did not hide it such as it would not appear in search results. https://support.google.com/webmasters/answer/7489871 The link points to a grass83 page. |
Describe the bug
The GRASS GIS manual pages of the different versions have been published for a long time with a difficult to understand concept of being invisible, redirected or shown, which also strongly affects the search engine ranking.
Note: issue posted here since the core manual pages are affected (while the cronjobs are maintained in the addon repository).
How Python publishes its man pages
The scope of of a pull request in preparation is to partially adopt the Python manual pages concept which looks like this (checked Oct 23, 2024):
<link rel="canonical" href="https://docs.python.org/3/index.html" />
<link rel="canonical" href="https://docs.python.org/3/index.html" />
<link rel="canonical" href="https://docs.python.org/3/index.html" />
<link rel="canonical" href="https://docs.python.org/3/index.html" />
<link rel="canonical" href="https://docs.python.org/3/index.html" />
<link rel="canonical" href="https://docs.python.org/3/index.html" />
<link rel="canonical" href="https://docs.python.org/3/index.html" />
A Google search for "Python documentation" returns as the first hit the "3.13.0 Documentation" with the URL https://docs.python.org/. Clicking on this takes the user to https://docs.python.org/3/, which is identical to https://docs.python.org/3.13/.
This means that the same documentation is served at two URLs:
How can GRASS GIS publish its manual pages?
While the situation in the GRASS GIS project is a bit different, we can mimic the Python approach to some extent.
Current GRASS GIS version overview
I have started to locally implement modifications in the cronjobs to improve the terrible SEO situation and make more versions properly visible (ovedue for a long time).
Now, for a few days we have the following approach deployed on the server for testing purposes (cronjob PR coming soon):
main
branch) with<link rel="canonical" href="https://grass.osgeo.org/grass-stable/manuals/index.html">
releasebranch_8_4
branch) - this is the overall main manual<link rel="canonical" href="https://grass.osgeo.org/grass-stable/manuals/index.html">
<link rel="canonical" href="https://grass.osgeo.org/grass-stable/manuals/index.html">
<link rel="canonical" href="https://grass.osgeo.org/grass-stable/manuals/index.html">
and red box pointing to to grass-stable (generated by cronjob with box URL and canonical version defined)<link rel="canonical" href="https://grass.osgeo.org/grass-stable/manuals/index.html">
and red box pointing to to grass-stable (old static pages)<link rel="canonical" href="https://grass.osgeo.org/grass-stable/manuals/index.html">
and red box pointing to to grass-stable (old static pages)Sitemaps:
Observations:
Note that it may even take weeks for Google etc. to "learn" the improved structure. At time, I am feeding Google search tools and Bing webmasters tools with the appropriate updates every few days.
Additional context
TODO:
The text was updated successfully, but these errors were encountered: