Skip to content

Axheladini/Link-Monitoring

Repository files navigation

LinkMonitoring / Python + Selenium

Automated tool for monitoring important links for any website or web App
This tool is useful for websites with high traffic, important business processes and important content which are updated at least once a day. On my daily work it monitors over 500 important links, execution time around 3 mins. We have incorporated it with Jenkins and the whole monitoring process is automated.

Current version is available only for windows machines. I have a plan to convert it for Linux and Mac but as usual it will take some time :)

This is my first tool build with Python. If anyone finds issues or a way to optimise the code feel free to comment and contribute.

In general LinkMonitoring is a tool that helps for monitoring and controlling if important links of your website or any other web App are at a given place and working as expected. Actually, it's a pre or post deployment tool to check and control automatically your important links. Following the trend of DevOps and having in mind that human testings are not very effective we tend to automate as much as we can. LinkMonitoring checks a given link two times at first it checks if a link is available at a given position (within DOM) for a specific page and then it checks for the HTTP request status of the same link.

How it works

At first, you create a dataset with links that you need to monitor. There is no user interface for importing your important links, therefore you will need to add important links as python objects ( Check Usage section ). Python by using Selenium with headless chrome (browser) points to each page where we have our important links, finds each important link by using one out of five functions: check_by_parent_id, check_by_text, check_by_tittle, check_by_class, check_by_link_id. The link found is compared with the one in the dataset and afterwards Python checks for the HTTP request code of the link. When all links are visited and controlled the tool generates detailed HTML report with results for each link.

Usage

  1. If Python is not installed at your machine, dowanload it from www.python.org/downloads/, make sure you install PIP also. On my local machine i am using Python 3.6. After installing Python you will need to install these Python modules:
  • python -m pip install selenium
  • python -m pip install requests
  • python -m pip install urllib3
  • python -m pip install tldextract
  1. Selenium uses chrome browser in headless mode to point to each page and find all links of corresponding page. Therefore, you will need to download the chromdriver.exe based on your chrome browser version. Click https://chromedriver.chromium.org/ and download the version related to your chrome browser. Copy the exe file to “C:\linkmonitoring\chromedriver.exe” ( this path is important because the tool expects it exactly there ).

  2. Pull, clone or download the repository from GitHub https://github.com/Axheladini/Link-Monitoring The tool comes with pre-filed data (important links dataset) from wikibooks.org pages. At this step from cmd point to your local repository and run the command:

         python index.py

         If Python is installed correctly the tool will start to monitor and controll all important links from the dataset. At the end the detailed HTML report will show up.

Importing your important links ( the dataset )

  1. This is the most important step, defining your dataset with important links. At this point it is an advantage if you have object oriented programming knowledge and skills but I will try to explain the whole process in details so any WebDev, DevOps or Webmaster can understand it.

Before going into details please check english.py and deutsch.py files under websites directory, the whole dataset logic is within these files.

  • The whole dataset of important link is under websites directory.
  • Do not update, change or modify init.py and config.py files, these are important files for the tool to run.
  • I always create a separate dataset file for each language of the website. As you can see inside websites directory I have created engllish.py and deutsch.py. You can name these files as you wish but you must follow one convention: use only letters and no special characters (like: *&^%$#@!~_+-?/).
  • Each dataset file should have the header code where some modules are imported. (Check the line that starts with DO NOT on english.py file).
  1. Create the object for the language of the pages where your important links are. First parameter is The name of the language and the second parameter is the ISO 2 code of a language/Country:

    lang = language.Language("English", "en")
  2. Create the first page with important links. First parameter is the name of the page and the second parameter is the link of the page:

    page_1 = language.Page("Home page", "https://en.wikibooks.org/wiki/Main_Page")
  3. Assign the page to the language:

    lang.push_page(page_1)
  4. This way you can add as much as you want pages for a given language or websites.

  5. Define important links for a given page.

     link1 = language.Linku("Featured Books", "https://en.wikibooks.org/wiki/Wikibooks:Featured_books", "", "0", "0", "0", "check_by_parent_id", "n-Featured-books", "")
Repeat this for all-important links that are present on page_1. * 2nd, 7nth and 8th attributes you will need to find them within the source code of the page where your important links are

     Attributes:
      Attribute 1 – Link name / Add the link name
      Attribute 2 – link URL / Add the URL of the link (complete link including the https://)
      Attribute 3 – Link that will be added by tool / Leave it empty
      Attribute 4 – http status of the link, added by LinkMonitoring / Initial value 0
      Attribute 5 – checked, 1 or 0 shows if link has been controlled or not, LinkMonitoring updates the value / Initial value 0
      Attribute 6 – status, 1 or 0 shows the status of the link, 1 if no errors and 0 if there is some error / Initial value 0
      Attribute 7 – LinkMonitoring has five functions which are helping on finding the URL of the link. Which value to add
      depends on how your links are constructed. Available values: check_by_parent_id, check_by_text, check_by_tittle,
      check_by_class, check_by_link_id

      1. check_by_parent_id

     DOM block example:

<li id="parent_id"><a href="https://www.somedomain/somepath/">Link text</a></li> 

     In this example LinkMonitoring will find and test the link based on parent id, in our example its parent_id

     2. check_by_text

     DOM block example:

<a href="https://www.somedomain/somepath/">Sample</a>

     In this example LinkMonitoring will find and test the link based on link text in this example its Sample

     3. Check_by_tittle

     DOM block example:

<a href="https://www.somedomain/somepath/" title="Some Link title">Link text</a>

     In this example LinkMonitoring will find and test the link based on title attribute in this example is Some Link title

     4. check_by_class

     DOM block example:

<a href="https://www.somedomain/somepath/" class="class_name_one class_name_two">Link text</a>

     In this example, LinkMonitoring will find and test the link based on class name. Having in mind that CSS classes
     are not unique, I would suggest adding two classes when you use check_by_class function to find the link.

     5. check_by_link_id

     DOM block example:

<a href="https://www.somedomain/somepath/" id="link_main_id">Link text</a>

     In this example, LinkMonitoring will find and test the link based on element id. Current example: link_main_id

* Suggestion: If the way that the links are presented on the web depends on you, I would suggest adding unique id names or unique title attributes because these two functions can find the link in faster and better way.

      Attribute 8 – This attribute depends from Attribute 7. You will need to find this value in the source code based on the
     function you will use on Attribute 7 (Check above examples)
      Attribute 9 – Leave this empty, this field is used from LinkMonitoring to add details based on link status.

  1. Connect each important link with corresponding page :

    page_1.push_link(link1)

Complete code on this stage for english.py would look like this: :

#create the language object
lang = language.Language("English", "en") 

#create the object of the first page where important links are*
page_1 = language.Page("Home page", "https://en.wikibooks.org/wiki/Main_Page")

#push this page to language object
lang.push_page(page_1)

#create the first important link object
link1 = language.Linku("Featured Books", "https://en.wikibooks.org/wiki/Wikibooks:Featured_books", "", "0", "0", "0", "check_by_parent_id", "n-Featured-books", "")

#Assign first important link object to the first page
page_1.push_link(link1)
  1. After you have added all pages and important links by repeating all sub steps on cmd point to your local repository and run the command: python index.py the automated process will start. The execution time will depend on the number of links. When the process will end a nice HTML report will show up with details for each link. Do not forget to update the dataset when you update or remove some important link on your website.

Contributing

If you decide to try or use this tool in your daily work, I would really appreciate if you use the Issues page to add your suggestions, experiences, review comments etc. Also feel free to contribute for improving, adding new features etc.

  1. Fork it!
  2. Create your feature branch: git checkout -b my-new-feature
  3. Commit your changes: git commit -am 'Add some feature'
  4. Push to the branch: git push origin my-new-feature
  5. Submit a pull request :D

History

LinkMonitoring 1.0

Credits

Agon Xheladini www.agonxheladini.com

License

FOSS

About

Build with Python and Selenium for monitoring important links

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published