From 5820743fbe3276bb41a5b9e62ee54e0993f67818 Mon Sep 17 00:00:00 2001 From: Tomas Nozicka Date: Thu, 2 Nov 2023 09:25:48 +0100 Subject: [PATCH] Add support documentation and must-gather --- docs/source/index.rst | 4 +- docs/source/support/index.rst | 12 +++ docs/source/{ => support}/known-issues.md | 0 docs/source/support/must-gather.md | 101 ++++++++++++++++++ docs/source/support/overview.md | 14 +++ docs/source/support/troubleshooting/index.rst | 8 ++ .../support/troubleshooting/installation.md | 34 ++++++ 7 files changed, 171 insertions(+), 2 deletions(-) create mode 100644 docs/source/support/index.rst rename docs/source/{ => support}/known-issues.md (100%) create mode 100644 docs/source/support/must-gather.md create mode 100644 docs/source/support/overview.md create mode 100644 docs/source/support/troubleshooting/index.rst create mode 100644 docs/source/support/troubleshooting/installation.md diff --git a/docs/source/index.rst b/docs/source/index.rst index 582c201f927..7891296c2c1 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -19,7 +19,7 @@ Scylla Operator Documentation performance upgrade releases - known-issues + support/index scylla-cluster-crd contributing @@ -60,6 +60,6 @@ Currently it supports: * :doc:`Performance tuning [Experimental] ` * :doc:`Upgrade procedures ` * :doc:`Releases ` -* :doc:`Known issues ` +* :doc:`Support ` * :doc:`Scylla Cluster Custom Resource Definition (CRD) ` * :doc:`Contributing to the Scylla Operator Project ` diff --git a/docs/source/support/index.rst b/docs/source/support/index.rst new file mode 100644 index 00000000000..9c623218acb --- /dev/null +++ b/docs/source/support/index.rst @@ -0,0 +1,12 @@ +========================================================== +Support +========================================================== + +.. toctree:: + :titlesonly: + :maxdepth: 1 + + overview + known-issues + troubleshooting/index + must-gather diff --git a/docs/source/known-issues.md b/docs/source/support/known-issues.md similarity index 100% rename from docs/source/known-issues.md rename to docs/source/support/known-issues.md diff --git a/docs/source/support/must-gather.md b/docs/source/support/must-gather.md new file mode 100644 index 00000000000..822451ab88e --- /dev/null +++ b/docs/source/support/must-gather.md @@ -0,0 +1,101 @@ +# Gathering data with must-gather + +`must-gather` is an embedded tool in Scylla Operator that helps collecting all the necessary info when something goes wrong. + +The tool talks to the Kubernetes API, retrieves a predefined set of resources and saves them into a folder in your current directory. +By default, all collected Secrets are censored to avoid sending sensitive data. +That said, you can always review the archive before you attach it to an issue or your support request. + +Given it needs to talk to the Kubernetes API, at the very least, you need to supply the `--kubeconfig` flag with a path to the kubeconfig file for your Kubernetes cluster, or set the `KUBECONFIG` environment variable. + +## Running must-gather + +There is more than one way to run `must-gather`. +Here are some examples of how you can run the tool. + +### Prerequisites + +All examples assume you have exported `KUBECONFIG` environment variable that points to a kubeconfig file on your machine. +If not, you can run this command to export the common default location. +Please make sure such a file exists. + +```bash +export KUBECONFIG=~/.kube/config +ls -l "${KUBECONFIG}" +``` + +```note:: + There can be slight deviations in the arguments for your container tool, depending on the container runtime, whether you use SELinux or similar factors. + + As an example, the need for the `Z` option on volume mounts depends on whether you use SELinux and what context is applied on your file or directory. + If you get an error mentioning `Error: lsetxattr : operation not supported`, try it without the `Z` option. +``` + +Let's also check whether your kubeconfig uses [external authentication plugin](https://kubernetes.io/docs/reference/access-authn-authz/authentication/#client-go-credential-plugins). +You can determine that by running +```bash +kubectl config view --minify +``` +and checking whether it uses an external exec plugin by looking for this pattern (containing the `exec` key) +```yaml +users: +- name: + user: + exec: +``` +If not, you can skip the rest of this section. + +In case your kubeconfig depends on external binaries, you have to take a few extra steps because the external binary won't be available within our container to authenticate the requests. + +Similarly to how Pods are run within Kubernetes, we'll create a dedicated ServiceAccount for must-gather and use it to run the tool. +(When you are done using it, feel free to remove the Kubernetes resources created for that purpose.) + +```bash +kubectl create namespace must-gather +kubectl -n must-gather create serviceaccount must-gather +kubectl create clusterrolebinding must-gather --clusterrole=cluster-admin --serviceaccount=must-gather:must-gather +export MUST_GATHER_TOKEN +MUST_GATHER_TOKEN=$( kubectl -n must-gather create token must-gather --duration=1h ) +kubeconfig=$( mktemp ) +# Create a copy of the existing kubeconfig and +# replace user authentication using yq, or by adjusting the fields manually. +kubectl config view --minify --raw -o yaml | yq -e '.users[0].user = {"token": env(MUST_GATHER_TOKEN)}' > "${kubeconfig}" +KUBECONFIG="${kubeconfig}" +``` + +```note:: + If you don't have `yq` installed, you can get it at https://github.com/mikefarah/yq/#install or you can replace the user authentication settings manually. +``` + +### Podman +```bash +podman run -it --pull=always --rm -v="${KUBECONFIG}:/kubeconfig:ro,Z" -v="$( pwd ):/workspace:Z" --workdir=/workspace docker.io/scylladb/scylla-operator:latest must-gather --kubeconfig=/kubeconfig +``` + +### Docker +```bash +docker run -it --pull=always --rm -v="${KUBECONFIG}:/kubeconfig:ro" -v="$( pwd ):/workspace" --workdir=/workspace docker.io/scylladb/scylla-operator:latest must-gather --kubeconfig=/kubeconfig +``` + +## Limiting must-gather to a particular namespace + +If you are running a large Kubernetes cluster with many ScyllaClusters, it may be useful to limit the collection of ScyllaClusters to a particular namespace. +Unless you hit scale issues, we advise not to use this mode, as sometimes the ScyllaClusters affect other collected resources, like the manager or they form a multi-datacenter. + +```bash +scylla-operator must-gather --namespace="" +``` + +```note:: + The `--namespace` flag affects only `ScyllaClusters`. + Other resources related to the operator installation or cluster state will still be collected from other namespaces. +``` + +### Collecting every resource in the cluster + +By default, `must-gather` collects only a predefined subset of resources. +You can also request collecting every resource in the Kubernetes API, if the default set wouldn't be enough to debug an issue. + +```bash +scylla-operator must-gather --all-resources +``` diff --git a/docs/source/support/overview.md b/docs/source/support/overview.md new file mode 100644 index 00000000000..7097438589c --- /dev/null +++ b/docs/source/support/overview.md @@ -0,0 +1,14 @@ +# Support overview + +## Get support + +ScyllaDB provides administrators with [paid support](https://www.scylladb.com/product/support/#enterprise-support), including Scylla Operator. + +## Troubleshooting issues + +To learn more about what to do when issues arise, visit our dedicated [troubleshooting section](troubleshooting/index). + +## Gather data about your cluster + +Scylla Operator contains an embedded tool called [must-gather](must-gather.md) that can collect the required information for requesting support or reporting issues. +Support requests and bug reports are required to attach the must-gather archive to help us understand the issue. diff --git a/docs/source/support/troubleshooting/index.rst b/docs/source/support/troubleshooting/index.rst new file mode 100644 index 00000000000..b83118e6b18 --- /dev/null +++ b/docs/source/support/troubleshooting/index.rst @@ -0,0 +1,8 @@ +========================================================== +Troubleshooting +========================================================== + +.. toctree:: + :maxdepth: 2 + + installation diff --git a/docs/source/support/troubleshooting/installation.md b/docs/source/support/troubleshooting/installation.md new file mode 100644 index 00000000000..226aa8c1d41 --- /dev/null +++ b/docs/source/support/troubleshooting/installation.md @@ -0,0 +1,34 @@ +# Troubleshooting installation issues + +## Webhooks +Scylla Operator provides several custom API resources that use webhooks to function properly. + +Unfortunately, it is often the case that user's clusters have modified SDN, that doesn't extend to the control plane, and Kubernetes apiserver is not able to reach the pods that serve the webhook traffic. +Another common case are firewall rules that block the webhook traffic. + +```note:: + To be called a Kubernetes cluster, clusters are required to pass Kubernetes conformance test suite. + This suite includes tests that require Kubernetes apiserver to be able to reach webhook services. +``` + +```note:: + Before filing an issue, please make sure your cluster webhook traffic can reach your webhook services, independently of Scylla Operator resources. +``` + +### EKS + +#### Custom CNI +EKS is currently breaking Kubernetes webhooks [when used with custom CNI networking](https://github.com/aws/containers-roadmap/issues/1215). + +```note:: + We advise you to avoid using such setups and use a conformant Kubernetes cluster that supports webhooks. +``` + +There are some workarounds where you can reconfigure the webhook to use Ingress or hostNetwork instead, but it's beyond a standard configuration that we support and not specific to the Scylla Operator. + +### GKE + +#### Private clusters + +If you use GKE private clusters you need to manually configure the firewall to allow webhook traffic. +You can find more information on how to do that in [GKE private clusters docs](https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#add_firewall_rules).