Stork is a data transfer scheduler that provides a common interface to different file transfer protocols.
Stork uses a client—server architecture where clients submit jobs to a Stork server and the Stork server performs the transfer when resources permit. The transfer happens asynchronously to the client, allowing users to go along their merry way and check on the status of the job at their own leisure. The Stork server responds to any failures that may occur during transfer automatically, handling them in an appropriate way and informing the user if a job can't be completed.
Stork plug-ins can be created to add support for new file transfer protocols
very easily (with any programming language, too) using a simple external
executable interface. If additional performance or integration with the Stork
server is desired, plug-ins can also be written in Java and extend the built-in
TransferModule
class, eliminating the communication overhead of piping
serialized messages between Stork and the transfer module.
Stork is intended to run in any Java Virtual Machine with the Java SE 7 runtime on any modern operating system. In reality it's only been tested with the OpenJDK 7 JRE on Linux, though there's no reason to believe it couldn't work on any other JVM.
Stork requires a Java SE 7 compatible runtime and development kit, along with
the utilities that typically come packaged with them (e.g., the jar
command).
Note that you will require both the JDK and the JRE.
How this is done depends on the system you are building Stork on and your preferred Java SE 7 implementation. For example, to install OpenJDK 7 on a system based on Debian Linux, such as Ubuntu, you would do:
apt-get install openjdk-7-jdk openjdk-7-jre
The following libraries are required as well:
- Apache Commons Logging 1.1
- Log4J 1.2.13
- JGlobus 1.8.0
- Netty 4.0.0
- JSch 0.1.50
Running Stork using the startup script in bin
currently requires Bash. In a
future release, this script will be cleaned up a bit to be more portable
between shells. If you do not have Bash installed, running the following
command after building effectively does what the script does:
java -cp 'lib/*' stork.Stork <command> [args]
This section will be formalized bit later. For now, the necessary libraries are included in the repository.
On most systems, simply run make
(specifically gmake
) or another GNU Make
compatible Make utility. For systems without a Make command installed, the
entire source tree can be found under the stork
directory. Running javac
on all of it, putting the resulting class files in a JAR file, and putting that
in lib
effectively accomplishes what make
does.
Documentation can be generated by running make doc
, and will be put in the
doc
directory.
Right now there's no automatic installation. There will be, just not right now.
If you want to install Stork system-wide, after building you can copy this
entire directory wherever you want (perhaps /usr/local/stork/
if you're using
a FHS'y system) and either making a symlink to bin/stork
somewhere in your
system executable path or manually setting your executable path to including
the bin
directory in your Stork installation path.
stork server
— Used to start a Stork server. Right now it just outputs everything to the command line until proper daemonization and automatic process killing with a PID file is supported. The server may be run in the background by passing the-d
option, e.g.:stork server -d
stork q
— List all the jobs in the Stork queue along with information about them, such as their status and progress. Can be used to find information about specific jobs by passing a job ID. Can also be used to filter jobs by their status.stork submit
— Submit a job to a Stork server. Can be passed a source and destination URL, a text file containing one or more jobs, or no arguments to read jobs from standard input.stork rm
— Cancel or terminate a submitted job or set of jobs.stork info
— Display configuration information about the server. Can also be used to find information about transfer modules.stork ls
— List a remote URL.stork user
— Login to a Stork server or register as a new user.
More information can be found by running stork --help
.
The Stork configuration file (stork.conf) can be used to change settings for the server and client tools. The search order for the configuration file is as follows:
- $STORK_CONFIG
- ~/.stork.conf
- /etc/stork.conf
- $STORK/stork.conf
- /usr/local/stork/stork.conf
- stork.conf in currect directory
Even if the file can't be found automatically, every valid config variable has a default value. The Stork server will issue a warning on startup if a config file cannot be found.
Start a Stork server, unless you plan on using an existing server. Submit a
job to the server using stork submit
. Upon submission, the job will be
assigned a job ID which stork submit
will output. Run stork q all
to view
all jobs and look for the job you submitted. You can use stork rm
to cancel
the job. You can run stork info
to see additional information about a server,
such as what protocols it supports.
Every Stork command honors the --help
option, which will cause it to display
usage information. Run, e.g., stork submit --help
to see detailed information
on how to use the submit command.
Stork accepts job descriptors in a number of simple key-value pair formats, including JSON and (simple) Condor ClassAd.
An example job descriptor in the JSON format:
{
"src" : "ftp://example.com/file1.txt",
"dest" : "ftp://example.com/file2.txt",
"max_attempts" : 5,
"email" = "[email protected]"
}
The same job descriptor in the ClassAd format:
[
src = "ftp://example.com/file1.txt";
dest = "ftp://example.com/file2.txt";
max_attempts = 5;
email = "[email protected]"
]
When reading job descriptors from a file using stork submit
the
opening and closing brackets may be omitted.
Stork is very liberal with what formats it accepts, and can receive
jobs in various formats with similar grammars to JSON/ClassAd, though
with weird combinations of symbols. How weird exactly? Weird enough
that the same parser understands both JSON and ClassAd. The parser can
be found in stork/ad/AdParser.java
if you're curious exactly what you
can throw at this thing.
More information about JSON and HTCondor ClassAd can be found here:
http://research.cs.wisc.edu/condor/classad/
Stork has some history as a component in the HTCondor distributed computing system, and with that being the case we've made some effort to maintain compatibility with HTCondor components that interfaced with previous Stork versions.
The following line can be added to the stork.conf
file to enable
compatibility mode for legacy components expecting a specific output
format from Stork commands:
condor_mode = true
Additional legacy support features are planning, including DAGMan logging output.
For those who are unfamiliar and want to learn more:
http://research.cs.wisc.edu/htcondor/
bin/
— Contains scripts to execute JARs. This directory gets included in the release tarfile for a binary release.build/
— Gets created when the project is built. Contains all class files generated by the Java compiler. Everything in here then gets put into stork.jar after building.doc/
— Contains documentation after runningmake doc
.lib/
— Contains external libraries that get included in stork.jar on build.libexec/
— Stork searches here for transfer module binaries when it is run. Gets included in the binary release tarfile.stork/
— Includes all the Java source files for Stork.Makefile
— Contains all the build rules formake
. You can manually configure some options for Stork here.README.md
— Contains instructions on how to use Stork. Gets included in the binary release tarfile.