v1.2.1
Risk Events
A critical bug that introduced in V1.0.0 had been fixed in v1.0.8.
if the user want to scale in some TiKV nodes with the command tiup cluster scale-in
with tiup-cluster, TiUP may delete TiKV nodes by mistake, causing the TiDB cluster data loss
The root cause:
- while TiUP treats these TiKV nodes' state as
tombstone
by mistake, it would report an error that confuses the user. - Then the user would execute the command
tiup cluster display
to confirm the real state of the cluster, but thedisplay
command also displays these TiKV nodes are intombstone
state too; - what's worse, the
display
command will destroy tombstone nodes automatically, no user confirmation required. So these TiKV nodes were destroyed by mistake.
To prevent this, we introduce a more safe manual way to clean up tombstone nodes in this release.
Improvements
- Introduce a more safe way to cleanup tombstone nodes (#858, @lucklove)
- When an user
scale-in
a TiKV server, it's data is not deleted until the user executes adisplay
command, it's risky because there is no choice for user to confirm - We have add a
prune
command for the cleanup stage, the display command will not cleanup tombstone instance any more
- When an user
- Skip auto-start the cluster before the scale-out action because there may be some damaged instance that can't be started (#848, @lucklove)
- In this version, the user should make sure the cluster is working correctly by themselves before executing
scale-out
- In this version, the user should make sure the cluster is working correctly by themselves before executing
- Introduce a more graceful way to check TiKV labels (#843, @lucklove)
- Before this change, we check TiKV labels from the config files of TiKV and PD servers, however, servers imported from tidb-ansible deployment don't store latest labels in local config, this causes inaccurate label information
- After this we will fetch PD and TiKV labels with PD api in display command
Fixes
- Fix the issue that there is datarace when concurrent save the same file (#836, @9547)
- We found that while the cluster deployed with TLS supported, the ca.crt file was saved multi times in parallel, this may lead to the ca.crt file to be left empty
- The influence of this issue is that the tiup client may not communicate with the cluster
- Fix the issue that files copied by TiUP may have different mode with origin files (#844, @lucklove)
- Fix the issue that the tiup script not updated after
scale-in
PD (#824, @9547)