Skip to content

Latest commit

 

History

History
79 lines (64 loc) · 3.45 KB

README.md

File metadata and controls

79 lines (64 loc) · 3.45 KB

User Agent Database

Women Empowerment in Zanzibar

Send a little karma down the way and support women empowerment in Zanzibar by helping to fund the local production of reusable female hygiene products. A very dear friend of mine runs the project. They were already able to buy hundreds of educational books. Sometimes, it takes so little to make a huge impact. If you'd like to thank me or support this work, donate. Additionally, any current and future sponsoring of my work via GitHub or other channels will flow one hundred percent to the NGO.

About

This is a constantly updated collection of user agents I encountered while running web servers on the internet. It's not an exhaustive list. It instead focuses on bots, crawlers, certain malware, automated software, scripts and uncommon ones. Lists of regular browser user agents are available elsewhere and too numerous to sanely and cleanly manage.

Usage

There are lots of use cases for user agent information, especially when parsing web server logs. Below are some examples that illustrate how to quickly get filtered information out of this data set using the excellent jq command-line tool.

Get SEO User Agents

cat data/*.json | jq -r 'select(.category==7) | .user_agents[]'

Get Chinese Crawlers

cat data/*.json | jq -r 'select(.country=="CN") | select(.type==2)  | .user_agents[]'

Get Suspicious CIDRs

cat data/*.json | jq -r 'select(.type==99) | .known_cidrs[]'

Contributing

To get a list of all encountered user agents you can run a command like

cat /var/log/nginx/* | awk -F\" '{print $6}' | sort -u > uas.txt
  • Create a single file JSON entry per entity. Use template.json to start. The new.sh helper script is great for this.
  • Index codes are listed in folder indexes.
  • Fill out as much information as possible, use existing entries for reference. Be especially thorough regarding country, website and description.
  • Format with Prettier. The default style is sufficient. You can do so by installing it (npm install -g prettier) and running prettier --write entry.json.
  • If there are multiple mostly identical user agent strings for an entry, restrict to one example per major semantic version.
  • All array entries are sorted, alphabetically and numerically.
  • If country does not apply or is international, use "ZZ" and null when not applicable.
  • null is to be interpreted as "not applicable" or "unknown", depending on context.

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

The data is completely free for personal, non-commercial usage, including FOSS projects. If you plan to include it in a product you earn money on or use for infrastructure you earn money with, I welcome your decision. However, you will need to license it by becoming a permanent top-tier GitHub sponsor. If this is too steep for you, let me know and we'll talk.