-
Notifications
You must be signed in to change notification settings - Fork 101
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add longevity test plan and results (#1113)
Problem: * We don't know if NGF can successfully process both control plane and data plane transactions over a period of time much greater than in our tests. * We didn't yet try to catch bugs that could only appear over a period of time (like resource leaks). Solution: - Create a longevity test plan - Run the test - Document the results CLOSES #956 Co-authored-by: bjee19 <[email protected]> Co-authored-by: Saylor Berman <[email protected]>
- Loading branch information
1 parent
704f8a8
commit 71d605e
Showing
14 changed files
with
634 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,149 @@ | ||
# Longevity Test | ||
|
||
This document describes how we test NGF for longevity. | ||
|
||
<!-- TOC --> | ||
|
||
- [Longevity Test](#longevity-test) | ||
- [Goals](#goals) | ||
- [Test Environment](#test-environment) | ||
- [Steps](#steps) | ||
- [Start](#start) | ||
- [Check the Test is Running Correctly](#check-the-test-is-running-correctly) | ||
- [End](#end) | ||
- [Analyze](#analyze) | ||
- [Results](#results) | ||
|
||
<!-- TOC --> | ||
|
||
## Goals | ||
|
||
- Ensure that NGF successfully processes both control plane and data plane transactions over a period of time much | ||
greater than in our other tests. | ||
- Catch bugs that could only appear over a period of time (like resource leaks). | ||
|
||
## Test Environment | ||
|
||
- A Kubernetes cluster with 3 nodes on GKE | ||
- Node: e2-medium (2 vCPU, 4GB memory) | ||
- Enabled GKE logging. | ||
- Enabled GKE Cloud monitoring with managed Prometheus service, with enabled: | ||
- system. | ||
- kube state - pods, deployments. | ||
- Tester VMs on Google Cloud: | ||
- Configuration: | ||
- Debian | ||
- Install packages: tmux, wrk | ||
- Location - same zone as the Kubernetes cluster. | ||
- First VM - for HTTP traffic | ||
- Second VM - for sending HTTPs traffic | ||
- NGF | ||
- Deployment with 1 replica | ||
- Exposed via a Service with type LoadBalancer, private IP | ||
- Gateway, two listeners - HTTP and HTTPs | ||
- Two apps: | ||
- Coffee - 3 replicas | ||
- Tea - 3 replicas | ||
- Two HTTPRoutes | ||
- Coffee (HTTP) | ||
- Tea (HTTPS) | ||
|
||
## Steps | ||
|
||
### Start | ||
|
||
Test duration - 4 days. | ||
|
||
1. Create a Kubernetes cluster on GKE. | ||
2. Deploy NGF. | ||
3. Expose NGF via a LoadBalancer Service with `"networking.gke.io/load-balancer-type":"Internal"` annotation to | ||
allocate an internal load balancer. | ||
4. Apply the manifests which will: | ||
1. Deploy the coffee and tea backends. | ||
2. Configure HTTP and HTTPS listeners on the Gateway. | ||
3. Expose coffee via HTTP listener and tea via HTTPS listener. | ||
4. Create two CronJobs to re-rollout backends: | ||
1. Coffee - every minute for an hour every 6 hours | ||
2. Tea - every minute for an hour every 6 hours, 3 hours apart from coffee. | ||
5. Configure Prometheus on GKE to pick up NGF metrics. | ||
|
||
```shell | ||
kubectl apply -f files | ||
``` | ||
|
||
5. In Tester VMs, update `/etc/hosts` to have an entry with the External IP of the NGF Service (`10.128.0.10` in this | ||
case): | ||
|
||
```text | ||
10.128.0.10 cafe.example.com | ||
``` | ||
|
||
6. In Tester VMs, start a tmux session (this is needed so that even if you disconnect from the VM, any launched command | ||
will keep running): | ||
|
||
```shell | ||
tmux | ||
``` | ||
|
||
7. In First VM, start wrk for 4 days for coffee via HTTP: | ||
|
||
```shell | ||
wrk -t2 -c100 -d96h http://cafe.example.com/coffee | ||
``` | ||
|
||
8. In Second VM, start wrk for 4 days for tea via HTTPS: | ||
|
||
```shell | ||
wrk -t2 -c100 -d96h https://cafe.example.com/tea | ||
``` | ||
|
||
Notes: | ||
|
||
- The updated coffee and tea backends in cafe.yaml include extra configuration for zero time upgrades, so that | ||
wrk in Tester VMs doesn't get 502 from NGF. Based on https://learnk8s.io/graceful-shutdown | ||
### Check the Test is Running Correctly | ||
Check that you don't see any errors: | ||
|
||
1. Check that GKE exports NGF pod logs to Google Cloud Operations Logging and Prometheus metrics to Google Cloud | ||
Monitoring. | ||
2. Check that traffic is flowing - look at the access logs of NGINX in Google Cloud Operations Logging. | ||
3. Check that CronJob can run. | ||
|
||
```shell | ||
kubectl create job --from=cronjob/coffee-rollout-mgr coffee-test | ||
kubectl create job --from=cronjob/tea-rollout-mgr tea-test | ||
``` | ||
|
||
In case of errors, double check if you prepared the environment and launched the test correctly. | ||
|
||
### End | ||
|
||
- Remove CronJobs. | ||
|
||
## Analyze | ||
|
||
- Traffic | ||
- Tester VMs (clients) | ||
- As wrk stop, they will print output upon termination. To connect to the tmux session with wrk, | ||
run `tmux attach -t 0` | ||
- Check for errors, latency, RPS | ||
- Logs | ||
- Check the logs for errors in Google Cloud Operations Logging. | ||
- NGF | ||
- NGINX | ||
- Check metrics in Google Cloud Monitoring. | ||
- NGF | ||
- CPU usage | ||
- NGINX | ||
- NGF | ||
- Memory usage | ||
- NGINX | ||
- NGF | ||
- NGINX metrics | ||
- Reloads | ||
|
||
## Results | ||
|
||
- [1.0.0](results/1.0.0/1.0.0.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
apiVersion: gateway.networking.k8s.io/v1beta1 | ||
kind: HTTPRoute | ||
metadata: | ||
name: coffee | ||
spec: | ||
parentRefs: | ||
- name: gateway | ||
sectionName: http | ||
hostnames: | ||
- "cafe.example.com" | ||
rules: | ||
- matches: | ||
- path: | ||
type: PathPrefix | ||
value: /coffee | ||
backendRefs: | ||
- name: coffee | ||
port: 80 | ||
--- | ||
apiVersion: gateway.networking.k8s.io/v1beta1 | ||
kind: HTTPRoute | ||
metadata: | ||
name: tea | ||
spec: | ||
parentRefs: | ||
- name: gateway | ||
sectionName: https | ||
hostnames: | ||
- "cafe.example.com" | ||
rules: | ||
- matches: | ||
- path: | ||
type: PathPrefix | ||
value: /tea | ||
backendRefs: | ||
- name: tea | ||
port: 80 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
apiVersion: v1 | ||
kind: Secret | ||
metadata: | ||
name: cafe-secret | ||
type: kubernetes.io/tls | ||
data: | ||
tls.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUNzakNDQVpvQ0NRQzdCdVdXdWRtRkNEQU5CZ2txaGtpRzl3MEJBUXNGQURBYk1Sa3dGd1lEVlFRRERCQmoKWVdabExtVjRZVzF3YkdVdVkyOXRNQjRYRFRJeU1EY3hOREl4TlRJek9Wb1hEVEl6TURjeE5ESXhOVEl6T1ZvdwpHekVaTUJjR0ExVUVBd3dRWTJGbVpTNWxlR0Z0Y0d4bExtTnZiVENDQVNJd0RRWUpLb1pJaHZjTkFRRUJCUUFECmdnRVBBRENDQVFvQ2dnRUJBTHFZMnRHNFc5aStFYzJhdnV4Q2prb2tnUUx1ek10U1Rnc1RNaEhuK3ZRUmxIam8KVzFLRnMvQVdlS25UUStyTWVKVWNseis4M3QwRGtyRThwUisxR2NKSE50WlNMb0NEYUlRN0Nhck5nY1daS0o4Qgo1WDNnVS9YeVJHZjI2c1REd2xzU3NkSEQ1U2U3K2Vab3NPcTdHTVF3K25HR2NVZ0VtL1Q1UEMvY05PWE0zZWxGClRPL051MStoMzROVG9BbDNQdTF2QlpMcDNQVERtQ0thaEROV0NWbUJQUWpNNFI4VERsbFhhMHQ5Z1o1MTRSRzUKWHlZWTNtdzZpUzIrR1dYVXllMjFuWVV4UEhZbDV4RHY0c0FXaGRXbElweHlZQlNCRURjczN6QlI2bFF1OWkxZAp0R1k4dGJ3blVmcUVUR3NZdWxzc05qcU95V1VEcFdJelhibHhJZVVDQXdFQUFUQU5CZ2txaGtpRzl3MEJBUXNGCkFBT0NBUUVBcjkrZWJ0U1dzSnhLTGtLZlRkek1ISFhOd2Y5ZXFVbHNtTXZmMGdBdWVKTUpUR215dG1iWjlpbXQKL2RnWlpYVE9hTElHUG9oZ3BpS0l5eVVRZVdGQ2F0NHRxWkNPVWRhbUloOGk0Q1h6QVJYVHNvcUNOenNNLzZMRQphM25XbFZyS2lmZHYrWkxyRi8vblc0VVNvOEoxaCtQeDljY0tpRDZZU0RVUERDRGh1RUtFWXcvbHpoUDJVOXNmCnl6cEJKVGQ4enFyM3paTjNGWWlITmgzYlRhQS82di9jU2lyamNTK1EwQXg4RWpzQzYxRjRVMTc4QzdWNWRCKzQKcmtPTy9QNlA0UFlWNTRZZHMvRjE2WkZJTHFBNENCYnExRExuYWRxamxyN3NPbzl2ZzNnWFNMYXBVVkdtZ2todAp6VlZPWG1mU0Z4OS90MDBHUi95bUdPbERJbWlXMGc9PQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg== | ||
tls.key: LS0tLS1CRUdJTiBQUklWQVRFIEtFWS0tLS0tCk1JSUV2UUlCQURBTkJna3Foa2lHOXcwQkFRRUZBQVNDQktjd2dnU2pBZ0VBQW9JQkFRQzZtTnJSdUZ2WXZoSE4KbXI3c1FvNUtKSUVDN3N6TFVrNExFeklSNS9yMEVaUjQ2RnRTaGJQd0ZuaXAwMFBxekhpVkhKYy92TjdkQTVLeApQS1VmdFJuQ1J6YldVaTZBZzJpRU93bXF6WUhGbVNpZkFlVjk0RlAxOGtSbjl1ckV3OEpiRXJIUncrVW51L25tCmFMRHF1eGpFTVBweGhuRklCSnYwK1R3djNEVGx6TjNwUlV6dnpidGZvZCtEVTZBSmR6N3Rid1dTNmR6MHc1Z2kKbW9RelZnbFpnVDBJek9FZkV3NVpWMnRMZllHZWRlRVJ1VjhtR041c09va3R2aGxsMU1udHRaMkZNVHgySmVjUQo3K0xBRm9YVnBTS2NjbUFVZ1JBM0xOOHdVZXBVTHZZdFhiUm1QTFc4SjFINmhFeHJHTHBiTERZNmpzbGxBNlZpCk0xMjVjU0hsQWdNQkFBRUNnZ0VBQnpaRE50bmVTdWxGdk9HZlFYaHRFWGFKdWZoSzJBenRVVVpEcUNlRUxvekQKWlV6dHdxbkNRNlJLczUyandWNTN4cU9kUU94bTNMbjNvSHdNa2NZcEliWW82MjJ2dUczYnkwaVEzaFlsVHVMVgpqQmZCcS9UUXFlL2NMdngvSkczQWhFNmJxdFRjZFlXeGFmTmY2eUtpR1dzZk11WVVXTWs4MGVJVUxuRmZaZ1pOCklYNTlSOHlqdE9CVm9Sa3hjYTVoMW1ZTDFsSlJNM3ZqVHNHTHFybmpOTjNBdWZ3ZGRpK1VDbGZVL2l0K1EvZkUKV216aFFoTlRpNVFkRWJLVStOTnYvNnYvb2JvandNb25HVVBCdEFTUE05cmxFemIralQ1WHdWQjgvLzRGY3VoSwoyVzNpcjhtNHVlQ1JHSVlrbGxlLzhuQmZ0eVhiVkNocVRyZFBlaGlPM1FLQmdRRGlrR3JTOTc3cjg3Y1JPOCtQClpoeXltNXo4NVIzTHVVbFNTazJiOTI1QlhvakpZL2RRZDVTdFVsSWE4OUZKZnNWc1JRcEhHaTFCYzBMaTY1YjIKazR0cE5xcVFoUmZ1UVh0UG9GYXRuQzlPRnJVTXJXbDVJN0ZFejZnNkNQMVBXMEg5d2hPemFKZUdpZVpNYjlYTQoybDdSSFZOcC9jTDlYbmhNMnN0Q1lua2Iwd0tCZ1FEUzF4K0crakEyUVNtRVFWNXA1RnRONGcyamsyZEFjMEhNClRIQ2tTazFDRjhkR0Z2UWtsWm5ZbUt0dXFYeXNtekJGcnZKdmt2eUhqbUNYYTducXlpajBEdDZtODViN3BGcVAKQWxtajdtbXI3Z1pUeG1ZMXBhRWFLMXY4SDNINGtRNVl3MWdrTWRybVJHcVAvaTBGaDVpaGtSZS9DOUtGTFVkSQpDcnJjTzhkUVp3S0JnSHA1MzRXVWNCMVZibzFlYStIMUxXWlFRUmxsTWlwRFM2TzBqeWZWSmtFb1BZSEJESnp2ClIrdzZLREJ4eFoyWmJsZ05LblV0YlhHSVFZd3lGelhNcFB5SGxNVHpiZkJhYmJLcDFyR2JVT2RCMXpXM09PRkgKcmppb21TUm1YNmxhaDk0SjRHU0lFZ0drNGw1SHhxZ3JGRDZ2UDd4NGRjUktJWFpLZ0w2dVJSSUpBb0dCQU1CVApaL2p5WStRNTBLdEtEZHUrYU9ORW4zaGxUN3hrNXRKN3NBek5rbWdGMU10RXlQUk9Xd1pQVGFJbWpRbk9qbHdpCldCZ2JGcXg0M2ZlQ1Z4ZXJ6V3ZEM0txaWJVbWpCTkNMTGtYeGh3ZEVteFQwVit2NzZGYzgwaTNNYVdSNnZZR08KditwVVovL0F6UXdJcWZ6dlVmV2ZxdStrMHlhVXhQOGNlcFBIRyt0bEFvR0FmQUtVVWhqeFU0Ym5vVzVwVUhKegpwWWZXZXZ5TW54NWZyT2VsSmRmNzlvNGMvMHhVSjh1eFBFWDFkRmNrZW96dHNpaVFTNkN6MENRY09XVWxtSkRwCnVrdERvVzM3VmNSQU1BVjY3NlgxQVZlM0UwNm5aL2g2Tkd4Z28rT042Q3pwL0lkMkJPUm9IMFAxa2RjY1NLT3kKMUtFZlNnb1B0c1N1eEpBZXdUZmxDMXc9Ci0tLS0tRU5EIFBSSVZBVEUgS0VZLS0tLS0K |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
apiVersion: apps/v1 | ||
kind: Deployment | ||
metadata: | ||
name: coffee | ||
spec: | ||
replicas: 3 | ||
selector: | ||
matchLabels: | ||
app: coffee | ||
template: | ||
metadata: | ||
labels: | ||
app: coffee | ||
spec: | ||
containers: | ||
- name: coffee | ||
image: nginxdemos/nginx-hello:plain-text | ||
ports: | ||
- containerPort: 8080 | ||
readinessProbe: | ||
httpGet: | ||
path: / | ||
port: 8080 | ||
lifecycle: | ||
preStop: | ||
exec: | ||
command: ["/bin/sleep", "15"] | ||
--- | ||
apiVersion: v1 | ||
kind: Service | ||
metadata: | ||
name: coffee | ||
spec: | ||
ports: | ||
- port: 80 | ||
targetPort: 8080 | ||
protocol: TCP | ||
name: http | ||
selector: | ||
app: coffee | ||
--- | ||
apiVersion: apps/v1 | ||
kind: Deployment | ||
metadata: | ||
name: tea | ||
spec: | ||
replicas: 3 | ||
selector: | ||
matchLabels: | ||
app: tea | ||
template: | ||
metadata: | ||
labels: | ||
app: tea | ||
spec: | ||
containers: | ||
- name: tea | ||
image: nginxdemos/nginx-hello:plain-text | ||
ports: | ||
- containerPort: 8080 | ||
readinessProbe: | ||
httpGet: | ||
path: / | ||
port: 8080 | ||
lifecycle: | ||
preStop: | ||
exec: | ||
command: ["/bin/sleep", "15"] | ||
--- | ||
apiVersion: v1 | ||
kind: Service | ||
metadata: | ||
name: tea | ||
spec: | ||
ports: | ||
- port: 80 | ||
targetPort: 8080 | ||
protocol: TCP | ||
name: http | ||
selector: | ||
app: tea |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
apiVersion: v1 | ||
kind: ServiceAccount | ||
metadata: | ||
name: rollout-mgr | ||
namespace: default | ||
--- | ||
apiVersion: rbac.authorization.k8s.io/v1 | ||
kind: Role | ||
metadata: | ||
name: rollout-mgr | ||
namespace: default | ||
rules: | ||
- apiGroups: | ||
- "apps" | ||
resources: | ||
- deployments | ||
verbs: | ||
- patch | ||
--- | ||
apiVersion: rbac.authorization.k8s.io/v1 | ||
kind: RoleBinding | ||
metadata: | ||
name: rollout-mgr | ||
namespace: default | ||
roleRef: | ||
apiGroup: rbac.authorization.k8s.io | ||
kind: Role | ||
name: rollout-mgr | ||
subjects: | ||
- kind: ServiceAccount | ||
name: rollout-mgr | ||
namespace: default | ||
--- | ||
apiVersion: batch/v1 | ||
kind: CronJob | ||
metadata: | ||
name: coffee-rollout-mgr | ||
namespace: default | ||
spec: | ||
schedule: "* */6 * * *" # every minute every 6 hours | ||
jobTemplate: | ||
spec: | ||
template: | ||
spec: | ||
serviceAccountName: rollout-mgr | ||
containers: | ||
- name: coffee-rollout-mgr | ||
image: curlimages/curl:8.3.0 | ||
imagePullPolicy: IfNotPresent | ||
command: | ||
- /bin/sh | ||
- -c | ||
args: | ||
- | | ||
TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token) | ||
RESTARTED_AT=$(date -u +"%Y-%m-%dT%H:%M:%SZ") | ||
curl -X PATCH -s -k -v \ | ||
-H "Authorization: Bearer $TOKEN" \ | ||
-H "Content-type: application/merge-patch+json" \ | ||
--data-raw "{\"spec\": {\"template\": {\"metadata\": {\"annotations\": {\"kubectl.kubernetes.io/restartedAt\": \"$RESTARTED_AT\"}}}}}" \ | ||
"https://kubernetes/apis/apps/v1/namespaces/default/deployments/coffee?fieldManager=kubectl-rollout" 2>&1 | ||
restartPolicy: OnFailure | ||
--- | ||
apiVersion: batch/v1 | ||
kind: CronJob | ||
metadata: | ||
name: tea-rollout-mgr | ||
namespace: default | ||
spec: | ||
schedule: "* 3,9,15,21 * * *" # every minute every 6 hours, 3 hours apart from coffee | ||
jobTemplate: | ||
spec: | ||
template: | ||
spec: | ||
serviceAccountName: rollout-mgr | ||
containers: | ||
- name: coffee-rollout-mgr | ||
image: curlimages/curl:8.3.0 | ||
imagePullPolicy: IfNotPresent | ||
command: | ||
- /bin/sh | ||
- -c | ||
args: | ||
- | | ||
TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token) | ||
RESTARTED_AT=$(date -u +"%Y-%m-%dT%H:%M:%SZ") | ||
curl -X PATCH -s -k -v \ | ||
-H "Authorization: Bearer $TOKEN" \ | ||
-H "Content-type: application/merge-patch+json" \ | ||
--data-raw "{\"spec\": {\"template\": {\"metadata\": {\"annotations\": {\"kubectl.kubernetes.io/restartedAt\": \"$RESTARTED_AT\"}}}}}" \ | ||
"https://kubernetes/apis/apps/v1/namespaces/default/deployments/tea?fieldManager=kubectl-rollout" 2>&1 | ||
restartPolicy: OnFailure |
Oops, something went wrong.