forked from projectvacuum/vcycle
-
Notifications
You must be signed in to change notification settings - Fork 0
/
admin-guide.html
363 lines (301 loc) · 14.5 KB
/
admin-guide.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
<html>
<head>
<title>Vcycle admin guide</title>
</head>
<body>
<h1 align=center>Vcycle Admin Guide<!-- version --></h1>
<!--
<p align=center><b>Andrew McNab <Andrew.McNab AT cern.ch></b>
<h2 style="border-bottom: 1px solid">Quick start</h2>
<p>
By following this quick start recipe you can verify that your installation
will work with Vcycle and see it creating and destroying virtual machines. You
will almost certainly want to start again from scratch by following the
step-by-step part of the Admin Guide so don't invest a lot of time here.
If you're already familiar with VMs, you could just skip straight there
but it's safest to go through the quick start to make sure the requirements
are all there.
<p>
To follow the quick start, you need an x86_64 Intel or AMD machine
with hardware virtualization (Intel VT-x or AMD-V) enabled in its BIOS; and
the machine needs to be installed with a version of Scientific Linux 6,
with libvirt installed and enabled. In particular, the packages
libvirt, libvirt-client, libvirt-python, qemu-kvm, and then
run "service libvirtd restart" to make sure libvirtd daemon is
running.
<p>
Install the Vcycle RPM and at the command line excecute:
<pre>
virsh list
virsh create /usr/share/doc/Vcycle-*/testkvm.xml
virsh list
virsh destroy testkvm
virsh list
</pre>
You should see no VMs listed as running to start with. After the create
command, the testkvm VM should be listed as running. Afer destroying it,
an empty list of VMs should be returned. If all this doesn't happen,
then something is wrong with your installation or hardware virtualization
isn't enabled. Please check the libvirt documentation to try to identify
where the problem is.
<p>
To start using Vcycle to manage VMs, it's necessary to have a standard NFS
server installed and running. Vcycle uses NFS to share some directories from
the factory machine to its virtual machines.
It's not necessary to configure the NFS server, as Vcycle uses exportfs
commands to create and destroy exports dynamically. If you have any iptables
rules blocking NFS you should disable them before starting Vcycled.
The factory machine must have a fully qualified domain name (FDQN) as
its hostname. So factory1.example.com not just factory1.
The 169.254.0.0 network should not be configured on the factory machine
before you start Vcycle. In particular, Zeroconf support should be disabled
by adding NOZEROCONF=yes to /etc/sysconfig/network and restarting
networking.
<p>
Next create the /etc/Vcycle.conf configuration file. Copy
/var/lib/Vcycle/doc/example.Vcycle.conf to /etc/Vcycle.conf and read through its
comments. There are 5 lines you need to check and probably change.
<dl>
<dt><b>Vcycle_space =</b> in [settings]
<dd>Set this to Vcycle01 in your site's domain. So if your site is .example.com
then set it to Vcycle01.example.com . A Vcycle space is a group of factory
machines that communicate with each other, and is equivalent to a subcluster
or subsite. A space's name is a fully qualified domain name (FQDN), and can be
used as a virtual CE name where necessary in other systems.
<dt><b>factories =</b> in [settings]
<dd>Since we're creating a space that contains a single factory machine,
set this to be the FQDN of the factory machine you're workng on.
<dt><b>total_machines</b>
<dd>Set this to the number of VMs to create and manage on this factory.
Vcycle will create hostnames for the VMs from the factory name. For
example, factory1.example.com will lead to factory1-00.example.com,
factory1-01.example.com, ...
<dt><b>root_public_key =</b> in [machinetype example]
<dd>This setting is not strictly necessary but is very useful. By copying
an RSA key pair to /root/.ssh on the factory machine, or creating
one with ssh-keygen you will be able to ssh into the VM as root and
see how it is laid out and how it is running. If you don't
place a public key at the location given in this option, you need
to comment the line out.
<dt><b>user_data_option_cvmfs_proxy =</b> in [machinetype example]
<dd>The value of this option is included in the user_data file given to the
VM. It must be set to the URL of an HTTP cache you have access to. If
you are already using cvmfs for grid worker nodes, you can use the same
value.
</dl>
<p>
The files needed for the example machinetype are fetched over HTTPS, as
indicated by the root_disk and user_data options which should not be
changed.
<p>
Just do <b>service Vcycled restart</b>
to make sure Vcycled is running and look in the log files.
<p>
When Vcycled starts it forks a factory process that watches the VMs and
creates or destroys them as necessary; and a responder process that
replies to queries from factories about what is running on this host.
These two processes have separate log files as /var/log/Vcycled-factory
and /var/log/Vcycled-responder .
<p>
In its log file, you should be able to see the factory
daemon trying to decide what to do and then creating the example
VM which runs for 5 minutes then shuts itself down. When deciding
what to do, the factory queries its own responder via UDP and this
should be visible in the responder's log file.
<p>
You should also be able to see the state of the VM using the
command <b>Vcycle scan</b>, where Vcycle is a command line tool that the
RPM installs in /usr/sbin.
<h2 style="border-bottom: 1px solid">Configuration step-by-step</h2>
<p>
This part of the guide covers the same ground as the quick start
guide but in a lot more detail. It's intended to help you choose
how best to configure your site.
<p>
The configuration file /etc/vcycle.conf uses the Python ConfigParser syntax,
which is similar to MS
Windows INI files. The file is divided into sections, with each section
name in square brackets. For example: [settings]. Each section contains
a series of option=value pairs. Sections with the same name are merged
and if options are duplicated, later values overwrite values given
earlier.
Any configuration file ending in .conf in the
directory /etc/vcycle.d will also be read. These files are read in
alphanumeric order, and then /etc/vcycle.conf is read if present.
<p>
Based on this ordering in /etc/vcycle.d/, options from space.conf
would override any given
in site.conf, but themselves be overwritten by options from
subspace.conf .
<h3>CernVM images</h3>
<p>
Vcycle currently requires the use of CernVM images with HEPiX
contexualization based on EC2/ISO ("CD-ROM") images,
and we recommend the use of CernVM 3 micro boot images.
<p>
If you need to download an image, they can be found on
the <a href="http://cernvm.cern.ch/portal/downloads">CernVM
downloads page</a>. <b>You must get the
generic .iso image file and not the .hdd file listed for KVM.</b>
<p>
However, most experiments will supply you with their own
URL from which Vcycle can automatically fetch their current
designated image version, which Vcycle caches in /var/lib/vcycle/imagecache
and the uploads to the IaaS service for you.
<h3>Installation of Vcycle: tar vs RPM</h3>
<p>
RPM is the recommended installation procedure, and RPMs are available
from the <a href="https://repo.gridpp.ac.uk/vacproject/vcycle/">Downloads
directory</a> on the Vcycle website.
<p>
It is also possible to install Vcycle from a tar file, using the install Makefile
target.
<h3>Configuration of Vcycle spaces</h3>
<p>
Each [space ...] section declaration must include the Vcycle space name,
which is also used as the virtual CE name.
<h3>GOCDB and GGUS</h3>
<p>
Vcycle is designed to work within the WLCG/EGI grid model of sites composed
of one or more CEs. Each Vcycle space name corresponds to one CE within a site,
and can co-exist with conventional CREAM or ARC CEs.
<p>
Problems encountered during the operation of Vcycle in production may
appear as tickets in <a href="https://ggus.eu/">GGUS</a>. The
<a href="https://wiki.egi.eu/wiki/GGUS:Vcycle_FAQ">Vac/Vcycle Support Unit</a>
appears under "Second Level - Software" on the GGUS
"Assign ticket to support unit" menu.
<p>
Vcycle writes APEL accounting records as described below. The GOCDB site
name given by gocdb_sitename in the [space ...] section is included in these
records. To avoid the risk of polluting the central APEL database with
incorrect site names, please use your real GOCDB sitename for this
option.
<h3>Setting up machinetypes</h3>
<p>
One [machinetype ... ...] section must exist for each machinetype in the system, with
the name of the machinetype given in the section name, such as [machinetype example].
A machinetype name must only consist of lowercase letters, numbers,
and hyphens. The Vcycle.conf(5) man page lists the options
that can be given for each machinetype.
<p>
The target_share option for the machinetype gives
the desired share of the total VMs available in this space for that
machinetype. The shares do not need to add up to 1.0, and if a share is not given
for a machinetype, then it is set to 0. The creation of new VMs can be completely
disabled by setting all shares to 0. Vcycle factories consult these shares
when deciding which machinetype to start as VMs become available.
<p>
For ease of management, the target_shares options can be grouped
together in a separate file in /etc/Vcycle.d apart from the main [machinetype ...]
sections, which is convenient if shares
are generated automatically or frequently edited by hand and pushed
out to the factory machines. For example:
<pre>
[machinetype example1]
target_share = 5.0
[machinetype example2]
target_share = 6.0
[machinetype example3]
target_share = 7.0
</pre>
<p>
The experiment or VO responsible for each machinetype should supply
step by step intructions on how to set up the rest of the [machinetype ...]
section and how to create the files to be placed in its subdirectory
of /var/lib/Vcycle/machinetypes (likely to be a hostcert.pem and hostkey.pem
pair to give to the VM.)
<h2 style="border-bottom: 1px solid">Starting and stopping Vcycled</h2>
<p>
The Vcycle daemon, vcycled, is started and stopped by /etc/rc.d/init.d/vcycled
on conjunction with the usual service and chkconfig commands. As the
configuration files are reread at the start of each cycle (by default,
one per minute) <b>it is not necessary to restart Vcycled after changing the
configuration</b>.
<p>
Furthermore, as Vcycled rereads the current state of the VMs from status
files and the hypervisor at the start of each cycle, Vcycled can be
restarted without disrupting running VMs or losing information about
their state.
In most cases it will even be possible to upgrade Vcycled from one patch
level to another within the same minor release without having to
drain the factory of running VMs. If problems arise during upgrades,
the most likely outcome is that Vcycle will fail to create new VMs until
the configuration is fixed, but the existing VMs will continue to run.
("We want Vcycle failures to look like planned draining.")
Furthermore, since Vcycle factory machines are autonomous, it is
straightforward to upgrade one factory in a production Vcycle space
to check the consequences.
<h2 style="border-bottom: 1px solid">Using the Vcycle command</h2>
<p>
The Vcycle(1) man page explains how the Vcycle command can be used to
scan the current Vcycle space and display the VMs running, along with
statistics about their CPU load and wall clock time.
<h2 style="border-bottom: 1px solid">Setting up Nagios</h2>
<p>
The check-Vcycled script installed in /usr/sbin can be used with
Nagios to monitor the state of the Vcycled on a factory node.
<p>
It can be run from the local Nagios nrpe daemon with a line like this
in its configuration file:
<pre>
command[check-Vcycled]=/usr/sbin/check-Vcycled 600
</pre>
which raises an alarm if the Vcycled heartbeat wasn't updated in the
last 600 seconds.
<h2 style="border-bottom: 1px solid">APEL accounting</h2>
<p>
When Vcycle detects that a VM has run for at least fizzle_seconds and
now finished, it writes a copy of the APEL
accounting message to subdirectories of /var/lib/Vcycle/apel-archive .
If you have set gocdb_sitename in [settings], then the file is also
written to /var/lib/Vcycle/apel-outgoing .
<p>
Vcycle uses the UUID of the VM as the local job
ID, the factory hostname as the local user ID, and the machinetype name as the
batch queue name. A unique user DN is constructed from the components
of the Vcycle space name. For example, Vcycle01.example.com becomes
/DC=com/DC=example/DC=Vcycle01 . If the accounting_fqan option is present in
the [machinetype ...] section, then for VMs of that type the value of that option
is included as the user FQAN, which indicates the VO associated with the VM.
The GOCDB sitename field is either the value you
gave explicitly or the Vcycle site name as a placeholder.
<p>
These accounting messages are designed to be published to the central
APEL service using the
standard APEL ssmsend command, which can be run on each factory machine
from cron. Please see the <a href="https://wiki.egi.eu/wiki/APEL">APEL
SSM client documentation for details</a>. One you have agreed use of APEL
with the APEL team, had your certificate authorized, and done any requested
tests, it should be sufficient that: you install the apel-ssm RPM on each
machine, install a host certificate (Vcycle-apel-cert.pem) and key
(Vcycle-apel-key.pem) authorized to talk to APEL in /etc/grid-security, make
sure gocdb_sitename is set, and arrange to run the ssmsend command from cron.
<p>
The ssmsend command can safely be run multiple times per day as it does
not connect to APEL if there are no new messages, and deletes messages
once they are sent. It can be run hourly and made to
use Vcycle-ssmsend-prod.cfg installed by the Vcycle RPM, by
placing the file Vcycle-ssmsend-cron in /etc/cron.d:
<pre>
22 * * * * root /usr/bin/ssmsend -c /etc/apel/Vcycle-ssmsend-prod.cfg >>/var/log/Vcycle-ssmsend-cron.log 2>&1
</pre>
<p>
If you use the Vcycle-ssmsend-prod.cfg file for production, please change the
value of the bdii option to a local or regional top bdii to avoid loading
the default service included in the file.
<p>
If you forget to
give gocdb_sitename at some point, you can make copies of the records in
/var/lib/Vcycle/apel-archive with the "Site:" fields corrected
to your GOCDB sitename and put them in /var/lib/Vcycle/apel-outgoing for
publishing by ssmsend.
<h2 style="border-bottom: 1px solid">Puppet</h2>
<p>
A simple Puppet module for Vcycle exists as the file init.pp which is installed
in the /var/lib/Vcycle/doc directory. There are extensive comments at the start
of the file which outline how to use it.
-->
<!-- Backoff tuning using Minimum VVV fizzle_seconds=NNN ? log lines -->
</body>
</html>