forked from pivotal-cf/docs-pcf-install
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathtrouble-advanced.html.md.erb
425 lines (303 loc) · 15.2 KB
/
trouble-advanced.html.md.erb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
---
title: Advanced Troubleshooting with the BOSH CLI
owner: Ops Manager
---
<strong><%= modified_date %></strong>
To perform advanced troubleshooting, you must log into the BOSH Director. From there, you can run specific commands using the BOSH Command Line Interface (CLI). BOSH Director diagnostic commands have access to information about your entire [Pivotal Cloud Foundry®](https://network.pivotal.io/products/pivotal-cf) (PCF) installation.
The BOSH Director runs on the virtual machine (VM) that Ops Manager deploys on the first install of the Ops Manager Director tile.
BOSH Director diagnostic commands have access to information about your entire
[Pivotal Cloud Foundry®](https://network.pivotal.io/products/pivotal-cf)
(PCF) installation.
<p class="note"><strong>Note</strong>: For more troubleshooting information, refer to the <a href="./troubleshooting.html">Troubleshooting Guide</a>.</p>
<p class="note"><strong>Note</strong>: Verify that no BOSH Director tasks are running on the Ops Manager VM before running any commands. You should
not proceed with troubleshooting until all BOSH Director tasks have completed or you
have ended them. See the <a href="https://bosh.io/docs/sysadmin-commands.html">Bosh CLI Commands</a> for more information.</p>
## <a id='prepare'></a>Prepare to Use the BOSH CLI ##
This section guides you through preparing to use the BOSH CLI.
### <a id='gather'></a>Gather Information ###
Before you begin troubleshooting with the BOSH CLI, collect the information you
need from the Ops Manager interface.
1. Open the Ops Manager interface by navigating to the Ops Manager fully qualified domain name (FQDN). Ensure that there are no installations or
updates in progress.
1. Click the **Ops Manager Director** tile and select the **Status** tab.
1. Record the IP address for the Director job. This is the IP
address of the VM where the BOSH Director runs.
<%= image_tag("ops-mgr-job-ip.png") %>
1. Select the **Credentials** tab.
1. Click **Link to Credential** to view and record the **Director Credentials**.
<%= image_tag("bosh-creds.png") %>
1. Return to the **Installation Dashboard**.
1. **(Optional)** To prepare to troubleshoot the job VM for any other product,
click the product tile and repeat the procedure above to record the IP address
and VM credentials for that job VM.
1. Log out of Ops Manager.
<p class="note"><strong>Note</strong>: You must log out of the Ops Manager interface to use the BOSH CLI.</p>
### <a id='ssh'></a>SSH into Ops Manager ###
Use SSH to connect to the Ops Manager web application VM.
To SSH into the Ops Manager VM:
**vSphere:**
You need the credentials used to import the PCF .ova or .ovf file into
your virtualization system.
1. From a command line, run `ssh ubuntu@OPS-MANAGER-FQDN`.
1. When prompted, enter the password that you set during the .ova deployment
into vCenter:
<pre class='terminal'>
$ ssh ubuntu@OPS-MANAGER-FQDN
Password: ***********
</pre>
**AWS and OpenStack:**
1. Locate the Ops Manager FQDN on the AWS EC2 instances page or the
OpenStack Access & Security page.
1. Change the permissions on the `.pem` file to be more restrictive:
<pre class="terminal">
$ chmod 600 ops_mgr.pem
</pre>
1. Run the `ssh` command:
<pre class="terminal">
ssh -i ops_mgr.pem ubuntu@OPS-MANAGER-FQDN
</pre>
### <a id='log-in'></a>Log into BOSH ###
Log into the BOSH Director using one of the following options below:
- [Internal UAAC Login](#log-in-uaa) - target the BOSH UAA using the [UAA Command Line Interface (UAAC)](../adminguide/uaa-user-management.html) to log into BOSH.
- [External User Store Login via SAML](#log-in-saml) - use an external user store to log into BOSH.
#### <a id='log-in-uaa'></a>Internal UAAC Login ####
1. Target the BOSH UAA on Ops Manager with the UAAC command `uaac target`.
<pre class='terminal'>
$ uaac target --ca-cert /var/tempest/workspaces/default/root_ca_certificate https://DIRECTOR_IP:8443
</pre>
1. Run `bosh target DIRECTOR-IP-ADDRESS` to target your Ops Manager
VM using the BOSH CLI.
1. Retrieve the UAA admin user password from the **Ops Manager Director>Credentials** tab. Alternatively, launch a browser and visit the following URL to obtain the password:
```
https://{OPSMANAGER}/api/v0/deployed/director/credentials/director_credentials
```
1. Log in using the BOSH Director credentials:
<pre class='terminal'>
$ bosh --ca-cert /var/tempest/workspaces/default/root_ca_certificate target 192.0.2.6
Target set to 'DIRECTOR_UUID'
Your username: director
Enter password: (DIRECTOR_CREDENTIAL)
Logged in as 'director'
</pre>
#### <a id='log-in-saml'></a>External User Store Login via SAML ####
To log into BOSH Director you need browser access to the BOSH Director in order to get a UAA Passcode. If you have browser access, skip to step 1 below.
If you do not have browser access to the BOSH Director, consider running `sshuttle` on your local workstation (Linux only). This permits you to browse the BOSH Director IP (192.0.2.16 in our example) as if it were a local address.
<pre class='terminal'>
$ git clone https://github.com/apenwarr/sshuttle.git
$ cd sshuttle
$ ./sshuttle -r username@opsmanagerIP 0.0.0.0/0 -vv
</pre>
1. Log in to your identity provider and use the information below to configure SAML Service Provider Properties:
* Service Provider Entity ID: bosh-uaa
* ACS URL : https:<span>//</span>BOSH-DIRECTOR-IP:8443/saml/SSO/alias/bosh-uaa
* Binding : HTTP Post
* SLO URL: https:<span>//</span>BOSH-DIRECTOR-IP:8443/saml/SSO/alias/bosh-uaa
* Binding : HTTP Redirect
* Name ID : Email Address
1. Log into BOSH using your SAML credentials.
<pre class='terminal'>
$ bosh login
Email: admin
Password:
One Time Code (Get one at https://192.0.2.16.11:8888/passcode):
</pre>
1. Click **Login with organization credentials (SAML)**.
<%= image_tag("login-saml-credentials.png") %>
1. Copy the **Temporary Authentication Code** that appears in your browser.
<%= image_tag("saml-login-temp-auth-code.png") %>
1. You see a login confirmation. For example:
<pre class='terminal'>Logged in as [email protected]
</pre>
### <a id='product'></a>Select a Product Deployment to Troubleshoot ###
When you import and install a product using Ops Manager, you deploy an instance
of the product described by a YAML file. Examples of available products include
Elastic Runtime, MySQL, or any other service that you imported and installed.
Perform the following steps to select a product deployment to troubleshoot:
1. Identify the YAML file that describes the deployment you want to
troubleshoot.
You identify the YAML file that describes a deployment by its filename.
For example, to identify Elastic Runtime deployments, run the following
command:
`find /var/tempest/workspaces/default/deployments -name cf-*.yml`
The table below shows the naming conventions for deployment files.
<table border="1" class="nice" >
<tr>
<th>Product</th><th>Deployment Filename Convention</th>
</tr>
<tr>
<td>Elastic Runtime</td>
<td>cf-<20-character_random_string>.yml</td>
</tr>
<tr>
<td>MySQL Dev</td>
<td>cf_services-<20-character_random_string>.yml</td>
</tr>
<tr>
<td>Other</td>
<td><20-character_random_string>.yml</td>
</tr>
</table>
<p class="note"><strong>Note</strong>: Where there is more than one installation of the same product, record the release number shown on the product tile in Operations Manager. Then, from the YAML files for that product, find the deployment that specifies the same release version as the product tile.</p>
1. Run `bosh status` and record the UUID value.
1. Open the `DEPLOYMENT-FILENAME.yml` file in a text editor and compare the
`director_uuids` value in this file with the UUID value that you recorded. If
the values do not match, perform the following steps:
1. Replace the `director_uuids` value with the UUID value.
1. Run `bosh deployment DEPLOYMENT-FILENAME.yml` to reset the file for your
deployment.
1. Run `bosh deployment DEPLOYMENT-FILENAME.yml` to instruct the BOSH Director
to apply BOSH CLI commands against the deployment described by the YAML file that you identified:
<pre class='terminal'>
$ bosh deployment /var/tempest/workspaces/default/deployments/cf-cca1234abcd.yml
</pre>
## <a id='cli'></a>Use the BOSH CLI for Troubleshooting##
This section describes three BOSH CLI commands commonly used during
troubleshooting.
* **VMS**: Lists all VMs in a deployment
* **Cloudcheck**: Runs a cloud consistency check and interactive repair
* **SSH**: Starts an interactive session or executes commands with a VM
### <a id='vms'></a>BOSH VMS ###
`bosh vms` provides an overview of the virtual machines that BOSH manages as part of the current deployment.
<pre class='terminal'>
$ bosh vms
Deployment `cf-66987630724a2c421061'
Director task 99
Task 99 done
+----------------------+---------+--------------------+---------------+
| Job/index | State | Resource Pool | IPs |
+----------------------+---------+--------------------+---------------+
| unknown/unknown | running | push-apps-manager | 198.51.100.13 |
| unknown/unknown | running | smoke-tests | 198.51.100.14 |
| cloud_controller/0 | running | cloud_controller | 198.51.100.23 |
| collector/0 | running | collector | 198.51.100.25 |
| consoledb/0 | running | consoledb | 198.51.100.29 |
| dea/0 | running | dea | 198.51.100.47 |
| health_manager/0 | running | health_manager | 198.51.100.20 |
| loggregator/0 | running | loggregator | 198.51.100.31 |
| loggregator_router/0 | running | loggregator_router | 198.51.100.32 |
| nats/0 | running | nats | 198.51.100.19 |
| nfs_server/0 | running | nfs_server | 198.51.100.21 |
| router/0 | running | router | 198.51.100.16 |
| saml_login/0 | running | saml_login | 198.51.100.28 |
| syslog/0 | running | syslog | 198.51.100.24 |
| uaa/0 | running | uaa | 198.51.100.27 |
| uaadb/0 | running | uaadb | 198.51.100.26 |
+----------------------+---------+--------------------+---------------+
VMs total: 15
Deployment `cf_services-2c3c918a135ab5f91ee1'
Director task 100
Task 100 done
+-----------------+---------+---------------+---------------+
| Job/index | State | Resource Pool | IPs |
+-----------------+---------+---------------+---------------+
| mysql_gateway/0 | running | mysql_gateway | 198.51.100.52 |
| mysql_node/0 | running | mysql_node | 198.51.100.53 |
+-----------------+---------+---------------+---------------+
VMs total: 2
</pre>
When troubleshooting an issue with your deployment, `bosh vms` may show a VM in
an **unknown** state.
Run [bosh cloudcheck](#cck) on a VM in an **unknown** state to instruct BOSH to
diagnose problems with the VM.
You can also run `bosh vms` to identify VMs in your deployment, then use the
[bosh ssh](#ssh) command to SSH into an identified VM for further
troubleshooting.
`bosh vms` supports the following arguments:
* `--details`: Report also includes Cloud ID, Agent ID, and whether or not the BOSH Resurrector has been enabled for each VM
* `--vitals`: Report also includes load, CPU, memory usage, swap usage, system disk usage, ephemeral disk usage, and persistent disk usage for each VM
* `--dns`: Report also includes the DNS A record for each VM
<p class="note"><strong>Note</strong>: The <strong>Status</strong> tab of the <strong>Elastic Runtime</strong> product tile displays information similar to the <code>bosh vms</code> output.</p>
### <a id='cck'></a>BOSH Cloudcheck ###
Run the `bosh cloudcheck` command to instruct BOSH to detect differences
between the VM state database maintained by the BOSH Director and the actual
state of the VMs. For each difference detected, `bosh cloudcheck` can offer the
following repair options:
* `Reboot VM`: Instructs BOSH to reboot a VM. Rebooting can resolve many
transient errors.
* `Ignore problem`: Instructs BOSH to do nothing.
You may want to ignore a problem in order to run `bosh ssh` and attempt
troubleshooting directly on the machine.
* `Reassociate VM with corresponding instance`: Updates the BOSH Director state
database.
Use this option if you believe that the BOSH Director state database is in
error and that a VM is correctly associated with a job.
* `Recreate VM using last known apply spec`: Instructs BOSH to destroy the
server and recreate it from the deployment manifest that the installer
provides. Use this option if a VM is corrupted.
* `Delete VM reference`: Instructs BOSH to delete a VM reference in the
Director state database. If a VM reference exists in the state database, BOSH
expects to find an agent running on the VM.
Select this option only if you know that this reference is in error.
Once you delete the VM reference, BOSH can no longer control the VM.
####Example Scenarios ####
**Unresponsive Agent**
<pre class='terminal'>
$ bosh cloudcheck
ccdb/0 (vm-3e37133c-bc33-450e-98b1-f86d5b63502a) is not responding:
- Ignore problem
- Reboot VM
- Recreate VM using last known apply spec
- Delete VM reference (DANGEROUS!)
</pre>
**Missing VM**
<pre class='terminal'>
$ bosh cloudcheck
VM with cloud ID `vm-3e37133c-bc33-450e-98b1-f86d5b63502a' missing:
- Ignore problem
- Recreate VM using last known apply spec
- Delete VM reference (DANGEROUS!)
</pre>
**Unbound Instance VM**
<pre class='terminal'>
$ bosh cloudcheck
VM `vm-3e37133c-bc33-450e-98b1-f86d5b63502a' reports itself as `ccdb/0' but does not have a bound instance:
- Ignore problem
- Delete VM (unless it has persistent disk)
- Reassociate VM with corresponding instance
</pre>
**Out of Sync VM**
<pre class='terminal'>
$ bosh cloudcheck
VM `vm-3e37133c-bc33-450e-98b1-f86d5b63502a' is out of sync:
expected `cf-d7293430724a2c421061: ccdb/0', got `cf-d7293430724a2c421061: nats/0':
- Ignore problem
- Delete VM (unless it has persistent disk)
</pre>
### <a id='bosh-ssh'></a>BOSH SSH ###
Use `bosh ssh` to SSH into the VMs in your deployment.
Follows the steps below to use `bosh ssh`:
1. Run `ssh-keygen -t rsa` to provide BOSH with the correct public key.
1. Accept the defaults.
1. Run `bosh ssh`.
1. Select a VM to access.
1. Create a password for the temporary user that the `bosh ssh` command creates.
Use this password if you need sudo access in this session.
Example:
<pre class='terminal'>
$ bosh ssh
1. ha_proxy/0
2. nats/0
3. etcd_and_metrics/0
4. etcd_and_metrics/1
5. etcd_and_metrics/2
6. health_manager/0
7. nfs_server/0
8. ccdb/0
9. cloud_controller/0
10. clock_global/0
11. cloud_controller_worker/0
12. router/0
13. uaadb/0
14. uaa/0
15. login/0
16. consoledb/0
17. dea/0
18. loggregator/0
19. loggregator_traffic_controller/0
20. push-apps-manager/0
21. smoke-tests/0
Choose an instance: 17
Enter password (use it to sudo on remote host): *******
Target deployment `cf_services-2c3c918a135ab5f91ee1'
Setting up ssh artifacts
</pre>