[bitnami/etcd] Stop relying on files for state #75906

pckhoi · 2024-12-25T09:11:54Z

Description of the change

The current etcd container and chart have a few major problems:

It relies on files outside of the data directory which could contain conflicting information compared to the data dir and the cluster
Removing the member during pre-stop hook is problematic. I guess this was added to support scaling down the cluster. If so, this logic is leaky. There are 2 cases where this breaks down:
1. If the pod was killed for reasons other than replicas update then the next time the pod starts, it will not be able to start from the existing data dir which means it must throw away the data dir and start from scratch.
2. If the cluster is scaled down and the PVC is retained, the next time the cluster is scaled up, the new member will encounter a non-empty data dir which it must discard
If the member was removed, the container chokes up on non-empty data dir in most cases except when recovering from a snapshot
It might attempt to add a new member even when an old member with the same name already exists. This is caused by relying on files for state
It runs etcdctl member update for unclear reasons when the data dir is not empty and there is a member ID
It relies on ETCD_INITIAL_CLUSTER_STATE to know whether the cluster is new which could be inaccurate

This PR add the following changes:

Add preupgrade.sh which should be run in a Helm pre-upgrade hook. When the cluster is scaled down, it detects and removes obsolete members with etcdctl member remove.
Remove prestop.sh
Stop storing/checking member ID from the member_id file. Instead, the remote member ID is read from the cluster with etcdctl member list, and the local member ID is checked for conflict during startup.
Stop storing/checking member removal state from member_removal.log. Check with etcdctl member list instead.
If the data dir is not empty, check if the member still belongs to the cluster (remote ID and local ID are the same). If there is a conflict, remove the data dir, remove the old member, add a new member, and start the member from scratch.
Remove environment variable ETCD_DISABLE_STORE_MEMBER_ID
Remove environment variable ETCD_DISABLE_PRESTOP
Environment variable ETCD_INITIAL_CLUSTER_STATE becomes read-only

Benefits

Not relying on files outside of the data directory means there is only a single source of truths (or only as many as there are live members in the cluster plus the data dir), which makes most operations more reliable
Removing obsolete members in Helm pre-upgrade hook means the etcdctl member remove command tends to be executed against a healthy cluster
If the pod was killed for reasons other than replica changes, it can rejoin the cluster on its own while keeping all its data intact
The container no longer chokes up on a non-empty data dir, even when the old member is removed

Possible drawbacks

If during initialization there is a network outage and the current member can't connect to other members, it will think that it must start a new cluster. That said, I don't think there is any good solution in this case except manual recovery.
I have not tested this set of changes outside of Helm/K8s

Applicable issues

fixes #16069

Additional information

Related changes in the Helm chart: bitnami/charts#31161 and bitnami/charts#31164

- Remove prestop logic (no longer removing member when container stops) - Remove members not included in ETCD_INITIAL_CLUSTERS during startup - Stop storing member id on a separate file, member id is checked from etcd data dir instead - Stop reading member removal state off of disk, probe the cluster instead - Remove old member (with the same name) if exist before adding new member - If data dir is not empty, check if the member still belongs to the cluster. If not, remove data dir, remove member with the same name, and add new member - Remove env var ETCD_DISABLE_STORE_MEMBER_ID - Remove env var ETCD_DISABLE_PRESTOP Signed-off-by: Khoi Pham <[email protected]>

Signed-off-by: Khoi Pham <[email protected]>

…s new Signed-off-by: Khoi Pham <[email protected]>

pckhoi · 2024-12-26T02:18:43Z

I'm planning to open a complementary PR in the charts repo. I will try to add more tests there.

juan131

Hi @pckhoi

Thanks so much for this amazing contribution! It'd definitely help on making the Bitnami etcd chart more stable.

I think the main concern/challenge with your changes would be providing a solution for users who may scale down the cluster via kubectl scale sts/etcd --replicas X (or via some HorizontalPodAutoscaler that may also scale down the cluster without Helm's control via hooks). Correct me if I'm wrong but this use case won't be covered, right?

bitnami/etcd/3.5/debian-12/rootfs/opt/bitnami/scripts/etcd/preupgrade.sh

bitnami/etcd/3.5/debian-12/rootfs/opt/bitnami/scripts/libetcd.sh

bitnami/etcd/README.md

pckhoi · 2025-01-03T02:10:31Z

@juan131 you're correct that the autoscaling use case isn't covered. People use Etcd for its consistency rather than for handling large, fluctuating traffic so I think autoscaling to handle large traffic is a niche use case.

As for manual scaling, running helm upgrade makes more sense if the cluster is installed via Helm. If people are scaling with kubectl scale then they're probably not using Helm which makes it difficult to operate. So no, deploying/upgrading without Helm is also not supported.

juan131 · 2025-01-03T07:19:01Z

Thanks for confirming so @pckhoi ! In that case, I'd add a warning at the "Upgrading" section alerting about what these changes imply (I mean, warning users to use exclusively Helm to scale the cluster):

https://github.com/bitnami/charts/tree/main/bitnami/etcd#upgrading

We could even add it in the chart NOTES:

https://github.com/bitnami/charts/blob/main/bitnami/etcd/templates/NOTES.txt

pckhoi · 2025-01-03T08:09:47Z

Sure, I will do that.

Signed-off-by: Khoi Pham <[email protected]>

pckhoi · 2025-01-04T12:25:41Z

@juan131 I have updated https://github.com/bitnami/charts/tree/main/bitnami/etcd#upgrading. As for https://github.com/bitnami/charts/blob/main/bitnami/etcd/templates/NOTES.txt, I don't see anything that needs to be updated.

bitnami/etcd/3.5/debian-12/rootfs/opt/bitnami/scripts/etcd/preupgrade.sh

bitnami/etcd/3.5/debian-12/rootfs/opt/bitnami/scripts/libetcd.sh

bitnami/etcd/README.md

Signed-off-by: Khoi Pham <[email protected]>

pckhoi · 2025-01-11T07:55:08Z

Thanks! I have addressed all the suggestsions.

juan131 · 2025-01-13T09:39:20Z

@pckhoi I think this PR looks great now! Could you please check my comments in the associated chart PR? Thanks in advance.

Signed-off-by: Khoi Pham <[email protected]>

bitnami/etcd/3.5/debian-12/rootfs/opt/bitnami/scripts/etcd/preupgrade.sh

Signed-off-by: Khoi Pham <[email protected]>

juan131 · 2025-01-16T08:51:40Z

bitnami/etcd/3.5/debian-12/rootfs/opt/bitnami/scripts/libetcd.sh

+    am_i_root && start_command=("run_as_user" "$ETCD_DAEMON_USER" "${start_command[@]}")
+    [[ -f "$ETCD_CONF_FILE" ]] && start_command+=("--config-file" "$ETCD_CONF_FILE")
+    $start_command > >(tee -a "$tmp_file") 2>&1 &
+    pid=$!
+    debug "Started etcd in background with PID $pid"
+
+    while read -r line; do
+        echo "$line" # Stream the output
+        if [[ "$line" =~ (established TCP streaming connection with remote peer|the member has been permanently removed from the cluster|ignored streaming request; ID mismatch|\"error\":\"cluster ID mismatch\") ]]; then
+            kill "$pid"
+            wait "$pid" 2>/dev/null
+            debug "Stopped etcd"
+            break
+        fi
+    done < <(tail -f "$tmp_file")


I've been doing more tests today and we'll have to change this block. This always throws etcd start logs to stdout regardless BITNAMI_DEBUG is set or not. Alternative:

Suggested change

am_i_root && start_command=("run_as_user" "$ETCD_DAEMON_USER" "${start_command[@]}")

[[ -f "$ETCD_CONF_FILE" ]] && start_command+=("--config-file" "$ETCD_CONF_FILE")

$start_command > >(tee -a "$tmp_file") 2>&1 &

pid=$!

debug "Started etcd in background with PID $pid"

while read -r line; do

echo "$line" # Stream the output

if [[ "$line" =~ (established TCP streaming connection with remote peer|the member has been permanently removed from the cluster|ignored streaming request; ID mismatch|\"error\":\"cluster ID mismatch\") ]]; then

kill "$pid"

wait "$pid" 2>/dev/null

debug "Stopped etcd"

break

fi

done < <(tail -f "$tmp_file")

am_i_root && start_command=("run_as_user" "$ETCD_DAEMON_USER" "${start_command[@]}")

[[ -f "$ETCD_CONF_FILE" ]] && start_command+=("--config-file" "$ETCD_CONF_FILE")

"${start_command[@]}" > "$tmp_file" 2>&1 &

while read -r line; do

debug_execute echo "$line"

if [[ "$line" =~ (established TCP streaming connection with remote peer|the member has been permanently removed from the cluster|ignored streaming request; ID mismatch|\"error\":\"cluster ID mismatch\") ]]; then

etcd_stop

debug "Stopped etcd"

break

fi

done < <(tail -f "$tmp_file")

By the way, it's not necessary to save the PID, you can use etcd_stop

juan131 · 2025-01-16T09:07:41Z

bitnami/etcd/3.5/debian-12/rootfs/opt/bitnami/scripts/etcd/preupgrade.sh

+
+expected="$(echo $ETCD_INITIAL_CLUSTER | tr -s ',' '\n' | awk -F= '{print $1}')"
+info "Expected cluster members are: $(echo "$expected" | tr -s '\n' ',' | sed 's/,$//g')"
+read -r -a obsolete_members <<<"$(comm -23 <(echo "$current" | awk -F: '{print $1}' | sort) <(echo "$expected" | sort))"


small fix:

Suggested change

read -r -a obsolete_members <<<"$(comm -23 <(echo "$current" | awk -F: '{print $1}' | sort) <(echo "$expected" | sort))"

read -r -a obsolete_members <<<"$(comm -23 <(echo "$current" | awk -F: '{print $1}' | sort) <(echo "$expected" | sort) | tr -s '\n' ' ')"

This change seems to break the script for me.

Actually, something else is breaking. Let me investigate more.

What problems are you experiencing? It works for me

Without this change only one member is removed given the obsolete_members wasn't an array but a string with the 1st obsolete member to remove, so when I did a test scaling down from 5 to 3 replicas it only removed one.

Sorry, your suggestion works. I just had to fix the loop as well. Everything should work now.

Signed-off-by: Khoi Pham <[email protected]>

pckhoi added 6 commits December 20, 2024 09:19

[bitnami/etcd] Fix remove_obsolete_members function

95811a3

Signed-off-by: Khoi Pham <[email protected]>

[bitnami/etcd] is_new_etcd_cluster queries current cluster

96e784c

Signed-off-by: Khoi Pham <[email protected]>

[bitnami/etcd] Added preupgrade.sh

c3a825a

Signed-off-by: Khoi Pham <[email protected]>

[bitnami/etcd] Document changes to ETCD_INITIAL_CLUSTER_STATE

a18d711

Signed-off-by: Khoi Pham <[email protected]>

[bitnami/etcd] Use etcdctl endpoint status to check whether cluster i…

6e4d2a6

…s new Signed-off-by: Khoi Pham <[email protected]>

github-actions bot added etcd triage Triage is needed labels Dec 25, 2024

github-actions bot assigned javsalgar Dec 25, 2024

github-actions bot requested a review from javsalgar December 25, 2024 09:12

pckhoi mentioned this pull request Dec 26, 2024

[bitnami/etcd] Add pre-upgrade hook bitnami/charts#31161

Open

4 tasks

Merge branch 'bitnami:main' into main

3d7115e

carrodher added verify Execute verification workflow for these changes in-progress labels Dec 26, 2024

github-actions bot removed the triage Triage is needed label Dec 26, 2024

github-actions bot unassigned javsalgar Dec 26, 2024

github-actions bot removed the request for review from javsalgar December 26, 2024 08:06

github-actions bot assigned alvneiayu Dec 26, 2024

github-actions bot requested a review from alvneiayu December 26, 2024 08:06

carrodher mentioned this pull request Dec 27, 2024

[bitnami/common] Add "common.capabilities.job.apiVersion" template bitnami/charts#31164

Merged

4 tasks

carrodher removed the request for review from alvneiayu December 27, 2024 07:36

carrodher unassigned alvneiayu Dec 27, 2024

carrodher requested review from juan131 and dgomezleon December 27, 2024 07:36

carrodher assigned juan131 and dgomezleon Dec 27, 2024

juan131 requested changes Dec 30, 2024

View reviewed changes

pckhoi added 5 commits January 3, 2025 16:49

[bitnami/etcd] Refactor remove_members function

71f24ce

Signed-off-by: Khoi Pham <[email protected]>

Merge branch 'main' of github.com:pckhoi/containers

519e4f6

[bitnami/etcd] Remove get_initial_cluster

ff52e84

Signed-off-by: Khoi Pham <[email protected]>

Merge branch 'bitnami:main' into main

227e58b

[bitnami/etcd] Remove prestop.sh

55b57e6

Signed-off-by: Khoi Pham <[email protected]>

juan131 requested changes Jan 8, 2025

View reviewed changes

[bitnami/etcd] Refactor preupgrade.sh

83d2a52

Signed-off-by: Khoi Pham <[email protected]>

[bitnami/etcd] Remove mention of ETCD_INITIAL_CLUSTER_STATE from README

9cd12be

Signed-off-by: Khoi Pham <[email protected]>

juan131 requested changes Jan 14, 2025

View reviewed changes

bitnami/etcd/3.5/debian-12/rootfs/opt/bitnami/scripts/etcd/preupgrade.sh Show resolved Hide resolved

bitnami/etcd/3.5/debian-12/rootfs/opt/bitnami/scripts/etcd/preupgrade.sh Outdated Show resolved Hide resolved

[bitnami/etcd] Fail preupgrade hook if members cannot be listed

9d199a8

Signed-off-by: Khoi Pham <[email protected]>

juan131 reviewed Jan 16, 2025

View reviewed changes

[bitnami/etcd] Fix preupgrade obsolete members loop

735bf0a

Signed-off-by: Khoi Pham <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bitnami/etcd] Stop relying on files for state #75906

[bitnami/etcd] Stop relying on files for state #75906

pckhoi commented Dec 25, 2024 •

edited

Loading

pckhoi commented Dec 26, 2024

juan131 left a comment

pckhoi commented Jan 3, 2025

juan131 commented Jan 3, 2025

pckhoi commented Jan 3, 2025

pckhoi commented Jan 4, 2025

pckhoi commented Jan 11, 2025

juan131 commented Jan 13, 2025

juan131 Jan 16, 2025

pckhoi Jan 18, 2025

juan131 Jan 16, 2025

pckhoi Jan 16, 2025

pckhoi Jan 16, 2025

juan131 Jan 16, 2025 •

edited

Loading

pckhoi Jan 18, 2025

	read -r -a obsolete_members <<<"$(comm -23 <(echo "$current" \| awk -F: '{print $1}' \| sort) <(echo "$expected" \| sort))"
	read -r -a obsolete_members <<<"$(comm -23 <(echo "$current" \| awk -F: '{print $1}' \| sort) <(echo "$expected" \| sort) \| tr -s '\n' ' ')"

[bitnami/etcd] Stop relying on files for state #75906

Are you sure you want to change the base?

[bitnami/etcd] Stop relying on files for state #75906

Conversation

pckhoi commented Dec 25, 2024 • edited Loading

Description of the change

Benefits

Possible drawbacks

Applicable issues

Additional information

pckhoi commented Dec 26, 2024

juan131 left a comment

Choose a reason for hiding this comment

pckhoi commented Jan 3, 2025

juan131 commented Jan 3, 2025

pckhoi commented Jan 3, 2025

pckhoi commented Jan 4, 2025

pckhoi commented Jan 11, 2025

juan131 commented Jan 13, 2025

juan131 Jan 16, 2025

Choose a reason for hiding this comment

pckhoi Jan 18, 2025

Choose a reason for hiding this comment

juan131 Jan 16, 2025

Choose a reason for hiding this comment

pckhoi Jan 16, 2025

Choose a reason for hiding this comment

pckhoi Jan 16, 2025

Choose a reason for hiding this comment

juan131 Jan 16, 2025 • edited Loading

Choose a reason for hiding this comment

pckhoi Jan 18, 2025

Choose a reason for hiding this comment

pckhoi commented Dec 25, 2024 •

edited

Loading

juan131 Jan 16, 2025 •

edited

Loading