Skip to content

Commit

Permalink
Add support documentation and must-gather
Browse files Browse the repository at this point in the history
  • Loading branch information
tnozicka committed Nov 2, 2023
1 parent 86f34c1 commit 1015dd0
Show file tree
Hide file tree
Showing 7 changed files with 171 additions and 2 deletions.
4 changes: 2 additions & 2 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Scylla Operator Documentation
performance
upgrade
releases
known-issues
support/index
scylla-cluster-crd
contributing

Expand Down Expand Up @@ -60,6 +60,6 @@ Currently it supports:
* :doc:`Performance tuning [Experimental] <performance>`
* :doc:`Upgrade procedures <upgrade>`
* :doc:`Releases <releases>`
* :doc:`Known issues <known-issues>`
* :doc:`Support <support/index>`
* :doc:`Scylla Cluster Custom Resource Definition (CRD) <scylla-cluster-crd>`
* :doc:`Contributing to the Scylla Operator Project <contributing>`
12 changes: 12 additions & 0 deletions docs/source/support/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
==========================================================
Support
==========================================================

.. toctree::
:titlesonly:
:maxdepth: 1

overview
known-issues
troubleshooting/index
must-gather
File renamed without changes.
101 changes: 101 additions & 0 deletions docs/source/support/must-gather.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# Gathering data with must-gather

`must-gather` is an embedded tool in Scylla Operator that helps collecting all the necessary info when something goes wrong.

The tool talks to the Kubernetes API, retrieves a predefined set of resources and saves them into a folder in your current directory.
By default, all collected Secrets are censored to avoid sending sensitive data.
That said, you can always review the archive before you attach it to an issue or your support request.

Given it needs to talk to the Kubernetes API, at the very least, you need to supply the `--kubeconfig` flag with a path to the kubeconfig file for your Kubernetes cluster, or set the `KUBECONFIG` environment variable.

## Running must-gather

There is more than one way to run `must-gather`.
Here are some examples of how you can run the tool.

### Prerequisites

All examples assume you have exported `KUBECONFIG` environment variable that points to a kubeconfig file on your machine.
If not, you can run this command to export the common default location.
Please make sure such a file exists.

```bash
export KUBECONFIG=~/.kube/config
ls -l "${KUBECONFIG}"
```

```note::
There can be slight deviations in the arguments for your container tool, depending on the container runtime, whether you use SELinux or similar factors.
As an example, the need for the `Z` option on volume mounts depends on whether you use SELinux and what context is applied on your file or directory.
If you get an error mentioning `Error: lsetxattr <path>: operation not supported`, try it without the `Z` option.
```

Let's also check whether your kubeconfig uses [external authentication plugin](https://kubernetes.io/docs/reference/access-authn-authz/authentication/#client-go-credential-plugins).
You can determine that by running
```bash
kubectl config view --minify
```
and checking whether it uses an external exec plugin by looking for this pattern (containing the `exec` key)
```yaml
users:
- name: <user_name>
user:
exec:
```
If not, you can skip the rest of this section.
In case your kubeconfig depends on external binaries, you have to take a few extra steps because the external binary won't be available within our container to authenticate the requests.
Similarly to how Pods are run within Kubernetes, we'll create a dedicated ServiceAccount for must-gather and use it to run the tool.
(When you are done using it, feel free to remove the Kubernetes resources created for that purpose.)
```bash
kubectl create namespace must-gather
kubectl -n must-gather create serviceaccount must-gather
kubectl create clusterrolebinding must-gather --clusterrole=cluster-admin --serviceaccount=must-gather:must-gather
export MUST_GATHER_TOKEN
MUST_GATHER_TOKEN=$( kubectl -n must-gather create token must-gather --duration=1h )
kubeconfig=$( mktemp )
# Create a copy of the existing kubeconfig and
# replace user authentication using yq, or by adjusting the fields manually.
kubectl config view --minify --raw -o yaml | yq -e '.users[0].user = {"token": env(MUST_GATHER_TOKEN)}' > "${kubeconfig}"
KUBECONFIG="${kubeconfig}"
```

```note::
If you don't have `yq` installed, you can get it at https://github.com/mikefarah/yq/#install or you can replace the user authentication settings manually.
```

### Podman
```bash
podman run -it --pull=always --rm -v="${KUBECONFIG}:/kubeconfig:ro,Z" -v="$( pwd ):/workspace:Z" --workdir=/workspace docker.io/scylladb/scylla-operator:latest must-gather --kubeconfig=/kubeconfig
```

### Docker
```bash
docker run -it --pull=always --rm -v="${KUBECONFIG}:/kubeconfig:ro" -v="$( pwd ):/workspace" --workdir=/workspace docker.io/scylladb/scylla-operator:latest must-gather --kubeconfig=/kubeconfig
```

## Limiting must-gather to a particular namespace

If you are running a large Kubernetes cluster with many ScyllaClusters, it may be useful to limit the collection of ScyllaClusters to a particular namespace.
Unless you hit scale issues, we advise not to use this mode, as sometimes the ScyllaClusters affect other collected resources, like the manager or they form a multi-datacenter.

```bash
scylla-operator must-gather --namespace="<namespace_with_broken_scyllacluster>"
```

```note::
The `--namespace` flag affects only `ScyllaClusters`.
Other resources related to the operator installation or cluster state will still be collected from other namespaces.
```

### Collecting every resource in the cluster

By default, `must-gather` collects only a predefined subset of resources.
You can also request collecting every resource in the Kubernetes API, if the default set wouldn't be enough to debug an issue.

```bash
scylla-operator must-gather --all-resources
```
14 changes: 14 additions & 0 deletions docs/source/support/overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Support overview

## Get support

ScyllaDB provides administrators with [paid support](https://www.scylladb.com/product/support/#enterprise-support), including Scylla Operator.

## Troubleshooting issues

To learn more about what to do when issues arise, visit our dedicated [troubleshooting section](troubleshooting/index).

## Gather data about your cluster

Scylla Operator contains an embedded tool called [must-gather](must-gather.md) that can collect the required information for requesting support or reporting issues.
Support requests and bug reports are required to attach the must-gather archive to help us understand the issue.
8 changes: 8 additions & 0 deletions docs/source/support/troubleshooting/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
==========================================================
Troubleshooting
==========================================================

.. toctree::
:maxdepth: 2

installation
34 changes: 34 additions & 0 deletions docs/source/support/troubleshooting/installation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Troubleshooting installation issues

## Webhooks
Scylla Operator provides several custom API resources that use webhooks to function properly.

Unfortunately, it is often the case that user's clusters have modified SDN, that doesn't extend to the control plane, and Kubernetes apiserver is not able to reach the pods that serve the webhook traffic.
Another common case are firewall rules that block the webhook traffic.

```note::
To be called a Kubernetes cluster, clusters are required to pass Kubernetes conformance test suite.
This suite includes tests that require Kubernetes apiserver to be able to reach webhook services.
```

```note::
Before filing an issue, please make sure your cluster webhook traffic can reach your webhook services, independently of Scylla Operator resources.
```

### EKS

#### Custom CNI
EKS is currently breaking Kubernetes webhooks [when used with custom CNI networking](https://github.com/aws/containers-roadmap/issues/1215).

```note::
We advise you to avoid using such setups and use a conformant Kubernetes cluster that supports webhooks.
```

There are some workarounds where you can reconfigure the webhook to use Ingress or hostNetwork instead, but it's beyond a standard configuration that we support and not specific to the Scylla Operator.

### GKE

#### Private clusters

If you use GKE private clusters you need to manually configure the firewall to allow webhook traffic.
You can find more information on how to do that in [GKE private clusters docs](https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#add_firewall_rules).

0 comments on commit 1015dd0

Please sign in to comment.