Skip to content

Real TOSBack datasets for 888+ privacy policies & terms of service (warning, this is 1GB+ to clone)

License

Notifications You must be signed in to change notification settings

pde/tosback2-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is TOSBack version 2, a clean redesign & reimplementation of EFF's
TOSBack project.

It uses Git as an inherently and efficiently versioned backend storage
database.

After cloning the git repository, you need to execute this command:

git submodule update --init --recursive

That will fetch a recent version of the GitPython code, which we depend upon.

*BUGS IN WGET*

If you want to actually run the crawler yourself (not really necessary unless
you're testing something), be aware that TOSBack2 also exposes a number of
bugs in common versions of wget.  As of December 2011, there are two bugs you
might need to patch yourself!

(FOR YOUR CONVENIENCE, a patched version of the wget source can be found in
lib/wget-1.13.4/ .  There is also a binary .deb that Debian and Ubuntu users
can try in lib/.  More hints on building from source below) 

1. Versions of wget built against
   gnutls may suffer from fatal memory leaks 
   https://lists.gnu.org/archive/html/bug-wget/2011-10/msg00050.html
   (so apply that patch, or build against openssl using ./configure --with-ssl=openssl).

2. You should also apply the following patch 
   https://savannah.gnu.org/support/download.php?file_id=24473
   to fix this bug: https://savannah.gnu.org/bugs/?21714

HINTS FOR BUILDING WGET FROM SOURCE ON DEBIAN OR UBUNTU

sudo apt-get build-dep wget
cd lib/wget-1.13.4/
fakeroot debian/rules binary
an installable .deb file *should* be written to the lib/ directory

About

Real TOSBack datasets for 888+ privacy policies & terms of service (warning, this is 1GB+ to clone)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •