Skip to content
This repository has been archived by the owner on Jan 25, 2019. It is now read-only.

Charts with randomly generated fields that are part of pod annotations fail to be installed (e.g. stable/redis) #76

Open
anurag-prakash-singh opened this issue Nov 30, 2018 · 2 comments

Comments

@anurag-prakash-singh
Copy link

anurag-prakash-singh commented Nov 30, 2018

While attempting to install the default version (i.e. without specifying any overrides or changing the values*.yaml files) of the stable/redis chart, I'm noticing that the Redis master pod goes into a Started -> Terminating -> Started loop. This happens because the master statefulset keeps getting revised. The reason why this is happening for the Redis chart is that it contains the following pod annotation in a StatefulSet spec (this is part of the redis-master-statefulset.yaml file in the chart):

checksum/secret: {{ include (print $.Template.BasePath "/secret.yaml") . | sha256sum }}

The secret itself is randomly generated thanks to this:

data:
  {{- if .Values.password }}
  redis-password: {{ .Values.password | b64enc | quote }}
  {{- else }}
  redis-password: {{ randAlphaNum 10 | b64enc | quote }}
  {{- end }}
{{- end -}}

In other words, if the values.yaml file doesn't specify a value for the password, the pod annotation will be randomly generated.

This results in a continuous release update loop when the helm-app-operator tries to install the release. As far as I can tell from the code, this is the sequence of events:

  1. the chart is installed for the first time. reconcile.go::Reconcile is invoked and after the installation finishes, the reconciler creates a work request to do a resync because of return reconcile.Result{RequeueAfter: r.ResyncPeriod}, err
  2. the resync is triggered. This time, we fall through to if manager.IsUpdateRequired() {, which returns true. The reason is that the way the operator checks if an update to a chart has happened is by doing a dry run installation of the chart. Naturally, each dry run will produce a manifest with a different value for the password field.
  3. the update process begins, leading to a new revision of the statefulset (leading to the old pod being terminated and replaced) being created.
  4. Repeat 2 and 3. (edited)

I understand that this problem is avoided by providing an override for generated values. However, for my project, it would be great if charts could work as-is, without having to provide overridden values.

@joelanford
Copy link
Member

Interesting, and yes you're spot on with the sequence of events. I am able to recreate this issue. There's some background that's helpful to understand why things currently work the way they do.

the reconciler creates a work request to do a resync because of return reconcile.Result{RequeueAfter: r.ResyncPeriod}, err

We reconcile frequently because we currently don't have a way to watch the objects created by the operator, so we have to resort to periodic reconciliation to make sure child objects don't drift. We have some ideas on how to solve this problem though. Even so, the reconciler will still be triggered periodically by the controller, just on a much less frequent basis, so this won't solve this problem on its own.

The reason is that the way the operator checks if an update to a chart has happened is by doing a dry run installation of the chart.

Doing the dry-run is the most straightforward way of detecting whether an update needs to occur, but you've definitely found a flaw with the approach.

One of the benefits of the dry-run approach is that it handles changes to the chart itself, not just the CR. One of the options may be to do a resourceVersion or spec comparison to detect CR changes and add a custom controller watch that detects changes to the chart and automatically reconciles all CRs. We'll have to think about alternatives, but I don't think this has a simple or quick solution.

In the meantime, as you pointed out the only workaround is to provide overrides in your CRs to avoid the randomly generated values.

it would be great if charts could work as-is, without having to provide overridden values.

I agree, this would be great, but there can be disparities between a chart's default release and the cluster's support for that release, so I don't think it's possible for the helm operator to be able to support every scenario. There will likely always be cases where CR overrides are necessary.

@anurag-prakash-singh
Copy link
Author

Thanks, @joelanford. I understand why this is a difficult problem to solve. For now, we'll just use the workaround. I can leave this issue open if it helps to track future work (e.g. the proposal you mentioned or alternative solutions).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants