A Hoya Cluster Specification is a JSON file which describes a cluster to Hoya: what application is to be deployed, which archive file contains the application, specific cluster-wide options, and options for the individual roles in a cluster.
These are options read by the Hoya Application Master; the program deployed in the YARN cluster to start all the other roles.
When the AM is started, all entries in the cluster-wide are loaded as Hadoop configuration values.
Only those related to Hadoop, the filesystem in use, YARN and hoya will be used; others are likely to be ignored.
Cluster wide options are used to configure the application itself.
These are specified at the command line with the -O key=value
syntax
All options beginning with the prefix site.
are converted into
site XML options for the specific application (assuming the application uses
a site XML configuration file)
Standard keys are defined in the class org.apache.hoya.api.OptionKeys
.
A boolean value to indicate this is a test run, not a production run. In this mode Hoya opts to fail fast, rather than retry container deployments when they fail. It is primarily used for internal tests.
list the single argument to invoke in the AM when starting a cluster.
For HBase and Accumulo, the command is version
-which is sufficient to
validate that the installed application tar file (or specified home directory)
is valid. It may be changed to another verb which the application supports
on the command line -though other parameters cannot be appended.
An integer stating the time in seconds before which a failed container is considered 'short lived'.
A failure of a short-lived container is treated as a sign of a problem with the role configuration and/or another aspect of the Hoya cluster -or a problem with the specific node on which the attempt to run the container was made.
An integer stating the number of failures tolerated in a single role before the cluster is considered to have failed.
A boolean flag to indicate whether application-specific health monitoring
should take place. Until this monitoring is completed the option
is ignored -it is added as false
by default on clusters created.
A Hoya application consists of the Hoya Application Master, "the AM", which manages the cluster, and a number of instances of the "roles" of the actual application.
For HBase the roles are master
and worker
; Accumulo has more.
For every role, the cluster specification can define
- How many instances of that role are desired.
- Some options with well known names for configuring the runtimes of the roles.
- Environment variables needed to help configure and run the process.
- Options for YARN
This argument is meant to provide a configuration-based static option that is provided to every instance of the given role. For example, this is useful in providing a binding address to Accumulo's Monitor process.
Users can override the option on the hoya executable using the roleopt argument:
--roleopt monitor role.additional.args "--address 127.0.0.1"
The amount of memory in MB for the YARN container hosting that role. Default "256".
The special value max
indicates that Hoya should request the
maximum value that YARN allows containers to have -a value
determined dynamically after the cluster starts.
Examples:
--roleopt master yarn.memory 2048
--roleopt worker yarn.memory max
If a YARN cluster is configured to set process memory limits via the OS, and the application tries to use more memory than allocated, it will fail with the exit code "143".
Number of "Cores" for the container hosting a role. Default value: "1"
The special value max
indicates that Hoya should request the
maximum value that YARN allows containers to have -a value
determined dynamically after the cluster starts.
As well as being able to specify the numeric values of memory and cores
in role, via the --roleopt
argument, you can now ask for the maximum
allowed value by using the parameter max
Examples:
--roleopt master yarn.vcores 2
--roleopt master yarn.vcores max
The TCP socket port number to use for the master node web UI. This is translated into an application-specific site.xml property for both Accumulo and HBase.
If set to a number other than the default, "0", then if the given port is in use, the role instance will not start. This will occur if YARN is already running a master node on that server, or if another application is using the same TCP port.
Heapsize as a JVM option string, such as "256M"
or "2G"
--roleopt worker jvm.heapsize 8G
This is not correlated with the YARN memory -changes in the YARN memory allocation are not reflected in the JVM heapsize -and vice versa.
All role options beginning with env.
are automatically converted to
environment variables which will be set for all instances of that role.
--roleopt worker env.MALLOC_ARENA 4
Here are options specific to Accumulo clusters.
Location of Zookeeper on the target machine. This is needed by the Accumulo startup scripts.
Location of Hadoop on the target machine. This is needed by the Accumulo startup scripts.
This is the password used to control access to the accumulo data. A random password (from a UUID, hence very low-entropy) is chosen when the cluster is created. A more rigorous password can be set on the command line at the time of cluster creation.
The Hoya Application Master has its own role, hoya
, which can also
be configured with role options. Currently only JVM and YARN options
are supported:
--roleopt hoya jvm.heapsize 256M
--roleopt hoya jvm.opts "-Djvm.property=true"
--roleopt hoya yarn.memory 512
Normal memory requirements of the AM are low, except in the special case of
starting an accumulo cluster for the first time. In this case, bin\accumulo init
needs to be run: the extra memory requirements of the accumulo process
need to be included in the hoya role's yarn.memory
values.