Skip to content

Latest commit

 

History

History
762 lines (551 loc) · 35.6 KB

tiup-cluster.md

File metadata and controls

762 lines (551 loc) · 35.6 KB
title summary aliases
Deploy and Maintain an Online TiDB Cluster Using TiUP
Learns how to deploy and maintain an online TiDB cluster using TiUP.
/docs/dev/tiup/tiup-cluster/
/docs/dev/reference/tools/tiup/cluster/

Deploy and Maintain an Online TiDB Cluster Using TiUP

This document focuses on how to use the TiUP cluster component. For the complete steps of online deployment, refer to Deploy a TiDB Cluster Using TiUP.

Similar to the TiUP playground component used for a local test deployment, the TiUP cluster component quickly deploys TiDB for production environment. Compared with playground, the cluster component provides more powerful production cluster management features, including upgrading, scaling, and even operation and auditing.

For the help information of the cluster component, run the following command:

tiup cluster
Starting component `cluster`: /home/tidb/.tiup/components/cluster/v1.12.3/cluster
Deploy a TiDB cluster for production

Usage:
  tiup cluster [command]

Available Commands:
  check       Precheck a cluster
  deploy      Deploy a cluster for production
  start       Start a TiDB cluster
  stop        Stop a TiDB cluster
  restart     Restart a TiDB cluster
  scale-in    Scale in a TiDB cluster
  scale-out   Scale out a TiDB cluster
  destroy     Destroy a specified cluster
  clean       (Experimental) Clean up a specified cluster
  upgrade     Upgrade a specified TiDB cluster
  display     Display information of a TiDB cluster
  list        List all clusters
  audit       Show audit log of cluster operation
  import      Import an existing TiDB cluster from TiDB-Ansible
  edit-config Edit TiDB cluster config
  reload      Reload a TiDB cluster's config and restart if needed
  patch       Replace the remote package with a specified package and restart the service
  help        Help about any command

Flags:
  -c, --concurrency int     Maximum number of concurrent tasks allowed (defaults to `5`)
      --format string       (EXPERIMENTAL) The format of output, available values are [default, json] (default "default")
  -h, --help                help for tiup
      --ssh string          (Experimental) The executor type. Optional values are 'builtin', 'system', and 'none'.
      --ssh-timeout uint    Timeout in seconds to connect a host via SSH. Operations that don't need an SSH connection are ignored. (default 5)
  -v, --version            TiUP version
      --wait-timeout uint   Timeout in seconds to wait for an operation to complete. Inapplicable operations are ignored. (defaults to `120`)
  -y, --yes                 Skip all confirmations and assumes 'yes'

Deploy the cluster

To deploy the cluster, run the tiup cluster deploy command. The usage of the command is as follows:

tiup cluster deploy <cluster-name> <version> <topology.yaml> [flags]

This command requires you to provide the cluster name, the TiDB cluster version (such as v7.6.0), and a topology file of the cluster.

To write a topology file, refer to the example. The following file is an example of the simplest topology:

Note:

The topology file used by the TiUP cluster component for deployment and scaling is written using yaml syntax, so make sure that the indentation is correct.

---

pd_servers:
  - host: 172.16.5.134
    name: pd-134
  - host: 172.16.5.139
    name: pd-139
  - host: 172.16.5.140
    name: pd-140

tidb_servers:
  - host: 172.16.5.134
  - host: 172.16.5.139
  - host: 172.16.5.140

tikv_servers:
  - host: 172.16.5.134
  - host: 172.16.5.139
  - host: 172.16.5.140

tiflash_servers:
  - host: 172.16.5.141
  - host: 172.16.5.142
  - host: 172.16.5.143

tiproxy_servers:
  - host: 172.16.5.144

grafana_servers:
  - host: 172.16.5.134

monitoring_servers:
  - host: 172.16.5.134

By default, TiUP is deployed as the binary files running on the amd64 architecture. If the target machine is the arm64 architecture, you can configure it in the topology file:

global:
  arch: "arm64"           # Configures all machines to use the binary files of the arm64 architecture by default

tidb_servers:
  - host: 172.16.5.134
    arch: "amd64"         # Configures this machine to use the binary files of the amd64 architecture
  - host: 172.16.5.139
    arch: "arm64"         # Configures this machine to use the binary files of the arm64 architecture
  - host: 172.16.5.140    # Machines that are not configured with the arch field use the default value in the global field, which is arm64 in this case.

...

Save the file as /tmp/topology.yaml. If you want to use TiDB v7.6.0 and your cluster name is prod-cluster, run the following command:

{{< copyable "shell-regular" >}}

tiup cluster deploy -p prod-cluster v7.6.0 /tmp/topology.yaml

During the execution, TiUP asks you to confirm your topology again and requires the root password of the target machine (the -p flag means inputting password):

Please confirm your topology:
TiDB Cluster: prod-cluster
TiDB Version: v7.6.0
Type        Host          Ports                            OS/Arch       Directories
----        ----          -----                            -------       -----------
pd          172.16.5.134  2379/2380                        linux/x86_64  deploy/pd-2379,data/pd-2379
pd          172.16.5.139  2379/2380                        linux/x86_64  deploy/pd-2379,data/pd-2379
pd          172.16.5.140  2379/2380                        linux/x86_64  deploy/pd-2379,data/pd-2379
tiproxy     172.16.5.144  6000/3080                        linux/x86_64  deploy/tiproxy-6000
tikv        172.16.5.134  20160/20180                      linux/x86_64  deploy/tikv-20160,data/tikv-20160
tikv        172.16.5.139  20160/20180                      linux/x86_64  deploy/tikv-20160,data/tikv-20160
tikv        172.16.5.140  20160/20180                      linux/x86_64  deploy/tikv-20160,data/tikv-20160
tidb        172.16.5.134  4000/10080                       linux/x86_64  deploy/tidb-4000
tidb        172.16.5.139  4000/10080                       linux/x86_64  deploy/tidb-4000
tidb        172.16.5.140  4000/10080                       linux/x86_64  deploy/tidb-4000
tiflash     172.16.5.141  9000/8123/3930/20170/20292/8234  linux/x86_64  deploy/tiflash-9000,data/tiflash-9000
tiflash     172.16.5.142  9000/8123/3930/20170/20292/8234  linux/x86_64  deploy/tiflash-9000,data/tiflash-9000
tiflash     172.16.5.143  9000/8123/3930/20170/20292/8234  linux/x86_64  deploy/tiflash-9000,data/tiflash-9000
prometheus  172.16.5.134  9090         deploy/prometheus-9090,data/prometheus-9090
grafana     172.16.5.134  3000         deploy/grafana-3000
Attention:
    1. If the topology is not what you expected, check your yaml file.
    2. Please confirm there is no port/directory conflicts in same host.
Do you want to continue? [y/N]:

After you enter the password, TiUP cluster downloads the required components and deploy them on the corresponding machines. When you see the following message, the deployment is successful:

Deployed cluster `prod-cluster` successfully

View the cluster list

After the cluster is successfully deployed, view the cluster list by running the following command:

{{< copyable "shell-root" >}}

tiup cluster list
Starting /root/.tiup/components/cluster/v1.12.3/cluster list
Name          User  Version    Path                                               PrivateKey
----          ----  -------    ----                                               ----------
prod-cluster  tidb  v7.6.0    /root/.tiup/storage/cluster/clusters/prod-cluster  /root/.tiup/storage/cluster/clusters/prod-cluster/ssh/id_rsa

Start the cluster

After the cluster is successfully deployed, start the cluster by running the following command:

{{< copyable "shell-regular" >}}

tiup cluster start prod-cluster

If you forget the name of your cluster, view the cluster list by running tiup cluster list.

TiUP uses systemd to start a daemon process. If the process terminates unexpectedly, it will be pulled up after 15 seconds.

Check the cluster status

TiUP provides the tiup cluster display command to view the status of each component in the cluster. With this command, you don't have to log in to each machine to see the component status. The usage of the command is as follows:

{{< copyable "shell-root" >}}

tiup cluster display prod-cluster
Starting /root/.tiup/components/cluster/v1.12.3/cluster display prod-cluster
TiDB Cluster: prod-cluster
TiDB Version: v7.6.0
ID                  Role        Host          Ports                            OS/Arch       Status  Data Dir              Deploy Dir
--                  ----        ----          -----                            -------       ------  --------              ----------
172.16.5.134:3000   grafana     172.16.5.134  3000                             linux/x86_64  Up      -                     deploy/grafana-3000
172.16.5.134:2379   pd          172.16.5.134  2379/2380                        linux/x86_64  Up|L    data/pd-2379          deploy/pd-2379
172.16.5.139:2379   pd          172.16.5.139  2379/2380                        linux/x86_64  Up|UI   data/pd-2379          deploy/pd-2379
172.16.5.140:2379   pd          172.16.5.140  2379/2380                        linux/x86_64  Up      data/pd-2379          deploy/pd-2379
172.16.5.134:9090   prometheus  172.16.5.134  9090                             linux/x86_64  Up      data/prometheus-9090  deploy/prometheus-9090
172.16.5.134:4000   tidb        172.16.5.134  4000/10080                       linux/x86_64  Up      -                     deploy/tidb-4000
172.16.5.139:4000   tidb        172.16.5.139  4000/10080                       linux/x86_64  Up      -                     deploy/tidb-4000
172.16.5.140:4000   tidb        172.16.5.140  4000/10080                       linux/x86_64  Up      -                     deploy/tidb-4000
172.16.5.141:9000   tiflash     172.16.5.141  9000/8123/3930/20170/20292/8234  linux/x86_64  Up      data/tiflash-9000     deploy/tiflash-9000
172.16.5.142:9000   tiflash     172.16.5.142  9000/8123/3930/20170/20292/8234  linux/x86_64  Up      data/tiflash-9000     deploy/tiflash-9000
172.16.5.143:9000   tiflash     172.16.5.143  9000/8123/3930/20170/20292/8234  linux/x86_64  Up      data/tiflash-9000     deploy/tiflash-9000
172.16.5.134:20160  tikv        172.16.5.134  20160/20180                      linux/x86_64  Up      data/tikv-20160       deploy/tikv-20160
172.16.5.139:20160  tikv        172.16.5.139  20160/20180                      linux/x86_64  Up      data/tikv-20160       deploy/tikv-20160
172.16.5.140:20160  tikv        172.16.5.140  20160/20180                      linux/x86_64  Up      data/tikv-20160       deploy/tikv-20160
172.16.5.144:6000   tiproxy     172.16.5.144  6000/3080                        linux/x86_64  Up      -                     deploy/tiproxy-6000

The Status column uses Up or Down to indicate whether the service is running normally.

For the PD component, |L or |UI might be appended to Up or Down. |L indicates that the PD node is a Leader, and |UI indicates that TiDB Dashboard is running on the PD node.

Scale in a cluster

Note:

This section describes only the syntax of the scale-in command. For detailed steps of online scaling, refer to Scale a TiDB Cluster Using TiUP.

Scaling in a cluster means making some node(s) offline. This operation removes the specific node(s) from the cluster and deletes the remaining files.

Because the offline process of the TiKV, TiFlash, and TiDB Binlog components is asynchronous (which requires removing the node through API), and the process takes a long time (which requires continuous observation on whether the node is successfully taken offline), special treatment is given to the TiKV, TiFlash, and TiDB Binlog components.

  • For TiKV, TiFlash, and Binlog:

    • TiUP cluster takes the node offline through API and directly exits without waiting for the process to be completed.

    • Afterwards, when a command related to the cluster operation is executed, TiUP cluster examines whether there is a TiKV, TiFlash, or Binlog node that has been taken offline. If not, TiUP cluster continues with the specified operation; If there is, TiUP cluster takes the following steps:

      1. Stop the service of the node that has been taken offline.
      2. Clean up the data files related to the node.
      3. Remove the node from the cluster topology.
  • For other components:

    • When taking the PD component down, TiUP cluster quickly deletes the specified node from the cluster through API, stops the service of the specified PD node, and deletes the related data files.
    • When taking other components down, TiUP cluster directly stops the node service and deletes the related data files.

The basic usage of the scale-in command:

tiup cluster scale-in <cluster-name> -N <node-id>

To use this command, you need to specify at least two flags: the cluster name and the node ID. The node ID can be obtained by using the tiup cluster display command in the previous section.

For example, to make the TiKV node on 172.16.5.140 offline, run the following command:

{{< copyable "shell-regular" >}}

tiup cluster scale-in prod-cluster -N 172.16.5.140:20160

By running tiup cluster display, you can see that the TiKV node is marked Offline:

{{< copyable "shell-root" >}}

tiup cluster display prod-cluster
Starting /root/.tiup/components/cluster/v1.12.3/cluster display prod-cluster
TiDB Cluster: prod-cluster
TiDB Version: v7.6.0
ID                  Role        Host          Ports                            OS/Arch       Status   Data Dir              Deploy Dir
--                  ----        ----          -----                            -------       ------   --------              ----------
172.16.5.134:3000   grafana     172.16.5.134  3000                             linux/x86_64  Up       -                     deploy/grafana-3000
172.16.5.134:2379   pd          172.16.5.134  2379/2380                        linux/x86_64  Up|L     data/pd-2379          deploy/pd-2379
172.16.5.139:2379   pd          172.16.5.139  2379/2380                        linux/x86_64  Up|UI    data/pd-2379          deploy/pd-2379
172.16.5.140:2379   pd          172.16.5.140  2379/2380                        linux/x86_64  Up       data/pd-2379          deploy/pd-2379
172.16.5.134:9090   prometheus  172.16.5.134  9090                             linux/x86_64  Up       data/prometheus-9090  deploy/prometheus-9090
172.16.5.134:4000   tidb        172.16.5.134  4000/10080                       linux/x86_64  Up       -                     deploy/tidb-4000
172.16.5.139:4000   tidb        172.16.5.139  4000/10080                       linux/x86_64  Up       -                     deploy/tidb-4000
172.16.5.140:4000   tidb        172.16.5.140  4000/10080                       linux/x86_64  Up       -                     deploy/tidb-4000
172.16.5.141:9000   tiflash     172.16.5.141  9000/8123/3930/20170/20292/8234  linux/x86_64  Up       data/tiflash-9000     deploy/tiflash-9000
172.16.5.142:9000   tiflash     172.16.5.142  9000/8123/3930/20170/20292/8234  linux/x86_64  Up       data/tiflash-9000     deploy/tiflash-9000
172.16.5.143:9000   tiflash     172.16.5.143  9000/8123/3930/20170/20292/8234  linux/x86_64  Up       data/tiflash-9000     deploy/tiflash-9000
172.16.5.134:20160  tikv        172.16.5.134  20160/20180                      linux/x86_64  Up       data/tikv-20160       deploy/tikv-20160
172.16.5.139:20160  tikv        172.16.5.139  20160/20180                      linux/x86_64  Up       data/tikv-20160       deploy/tikv-20160
172.16.5.140:20160  tikv        172.16.5.140  20160/20180                      linux/x86_64  Offline  data/tikv-20160       deploy/tikv-20160
172.16.5.144:6000   tiproxy     172.16.5.144  6000/3080                        linux/x86_64  Up       -                     deploy/tiproxy-6000

After PD schedules the data on the node to other TiKV nodes, this node will be deleted automatically.

Scale out a cluster

Note:

This section describes only the syntax of the scale-out command. For detailed steps of online scaling, refer to Scale a TiDB Cluster Using TiUP.

The scale-out operation has an inner logic similar to that of deployment: the TiUP cluster component firstly ensures the SSH connection of the node, creates the required directories on the target node, then executes the deployment operation, and starts the node service.

When you scale out PD, the node is added to the cluster by join, and the configurations of services associated with PD are updated. When you scale out other services, the service is started directly and added to the cluster.

All services conduct correctness validation when they are scaled out. The validation results show whether the scaling-out is successful.

To add a TiKV node and a PD node in the tidb-test cluster, take the following steps:

  1. Create a scale.yaml file, and add IPs of the new TiKV and PD nodes:

    Note:

    You need to create a topology file, which includes only the description of the new nodes, not the existing nodes.

    ---
    
    pd_servers:
      - host: 172.16.5.140
    
    tikv_servers:
      - host: 172.16.5.140
  2. Perform the scale-out operation. TiUP cluster adds the corresponding nodes to the cluster according to the port, directory, and other information described in scale.yaml.

    {{< copyable "shell-regular" >}}

    tiup cluster scale-out tidb-test scale.yaml

    After the command is executed, you can check the status of the scaled-out cluster by running tiup cluster display tidb-test.

Rolling upgrade

Note:

This section describes only the syntax of the upgrade command. For detailed steps of online upgrade, refer to Upgrade TiDB Using TiUP.

The rolling upgrade feature leverages the distributed capabilities of TiDB. The upgrade process is made as transparent as possible to the application, and does not affect the business.

Before the upgrade, TiUP cluster checks whether the configuration file of each component is rational. If so, the components are upgraded node by node; if not, TiUP reports an error and exits. The operations vary with different nodes.

Operations for different nodes

  • Upgrade the PD node

    • First, upgrade non-Leader nodes.
    • After all the non-Leader nodes are upgraded, upgrade the Leader node.
      • The upgrade tool sends a command to PD that migrates Leader to an already upgraded node.
      • After the Leader role is switched to another node, upgrade the previous Leader node.
    • During the upgrade, if any unhealthy node is detected, the tool stops this upgrade operation and exits. You need to manually analyze the cause, fix the issue and run the upgrade again.
  • Upgrade the TiKV node

    • First, add a scheduling operation in PD that migrates the Region Leader of this TiKV node. This ensures that the upgrade process does not affect the business.
    • After the Leader is migrated, upgrade this TiKV node.
    • After the upgraded TiKV is started normally, remove the scheduling of the Leader.
  • Upgrade other services

    • Stop the service normally and update the node.

Upgrade command

The flags for the upgrade command is as follows:

Usage:
  cluster upgrade <cluster-name> <version> [flags]

Flags:
      --force                  Force upgrade won't transfer leader
  -h, --help                   help for upgrade
      --transfer-timeout int   Timeout in seconds when transferring PD and TiKV store leaders (default 600)

Global Flags:
      --ssh string          (Experimental) The executor type. Optional values are 'builtin', 'system', and 'none'.
      --wait-timeout int  Timeout of waiting the operation
      --ssh-timeout int   Timeout in seconds to connect host via SSH, ignored for operations that don't need an SSH connection. (default 5)
  -y, --yes               Skip all confirmations and assumes 'yes'

For example, the following command upgrades the cluster to v7.6.0:

{{< copyable "shell-regular" >}}

tiup cluster upgrade tidb-test v7.6.0

Update configuration

If you want to dynamically update the component configurations, the TiUP cluster component saves a current configuration for each cluster. To edit this configuration, execute the tiup cluster edit-config <cluster-name> command. For example:

{{< copyable "shell-regular" >}}

tiup cluster edit-config prod-cluster

TiUP cluster opens the configuration file in the vi editor. If you want to use other editors, use the EDITOR environment variable to customize the editor, such as export EDITOR=nano.

After editing the file, save the changes. To apply the new configuration to the cluster, execute the following command:

{{< copyable "shell-regular" >}}

tiup cluster reload prod-cluster

The command sends the configuration to the target machine and restarts the cluster to make the configuration take effect.

Note:

For monitoring components, customize the configuration by executing the tiup cluster edit-config command to add a custom configuration path on the corresponding instance. For example:

---

grafana_servers:
  - host: 172.16.5.134
    dashboard_dir: /path/to/local/dashboards/dir

monitoring_servers:
  - host: 172.16.5.134
    rule_dir: /path/to/local/rules/dir

alertmanager_servers:
  - host: 172.16.5.134
    config_file: /path/to/local/alertmanager.yml

The content and format requirements for files under the specified path are as follows:

  • The folder specified in the dashboard_dir field of grafana_servers must contain full *.json files.
  • The folder specified in the rule_dir field of monitoring_servers must contain full *.rules.yml files.
  • For the format of files specified in the config_file field of alertmanager_servers, refer to the Alertmanager configuration template.

When you execute tiup reload, TiUP first deletes all old configuration files in the target machine and then uploads the corresponding configuration from the control machine to the corresponding configuration directory of the target machine. Therefore, if you want to modify a particular configuration file, make sure that all configuration files (including the unmodified ones) are in the same directory. For example, to modify Grafana's tidb.json file, you need to first copy all the *.json files from Grafana's dashboards directory to your local directory. Otherwise, other JSON files will be missing from the target machine.

Note:

If you have configured the dashboard_dir field of grafana_servers, after executing the tiup cluster rename command to rename the cluster, you need to complete the following operations:

  1. In the local dashboards directory, change the cluster name to the new cluster name.
  2. In the local dashboards directory, change datasource to the new cluster name, because datasource is named after the cluster name.
  3. Execute the tiup cluster reload -R grafana command.

Update component

For normal upgrade, you can use the upgrade command. But in some scenarios, such as debugging, you might need to replace the currently running component with a temporary package. To achieve this, use the patch command:

{{< copyable "shell-root" >}}

tiup cluster patch --help
Replace the remote package with a specified package and restart the service

Usage:
  cluster patch <cluster-name> <package-path> [flags]

Flags:
  -h, --help                    help for patch
  -N, --node strings            Specify the nodes
      --offline                 Patch a stopped cluster
      --overwrite               Use this package in the future scale-out operations
  -R, --role strings            Specify the roles
      --transfer-timeout uint   Timeout in seconds when transferring PD and TiKV store leaders, also for TiCDC drain one capture (default 600)

Global Flags:
  -c, --concurrency int     max number of parallel tasks allowed (default 5)
      --format string       (EXPERIMENTAL) The format of output, available values are [default, json] (default "default")
      --ssh string          (EXPERIMENTAL) The executor type: 'builtin', 'system', 'none'.
      --ssh-timeout uint    Timeout in seconds to connect host via SSH, ignored for operations that don't need an SSH connection. (default 5)
      --wait-timeout uint   Timeout in seconds to wait for an operation to complete, ignored for operations that don't fit. (default 120)
  -y, --yes                 Skip all confirmations and assumes 'yes'

If a TiDB hotfix package is in /tmp/tidb-hotfix.tar.gz and you want to replace all the TiDB packages in the cluster, run the following command:

{{< copyable "shell-regular" >}}

tiup cluster patch test-cluster /tmp/tidb-hotfix.tar.gz -R tidb

You can also replace only one TiDB package in the cluster:

{{< copyable "shell-regular" >}}

tiup cluster patch test-cluster /tmp/tidb-hotfix.tar.gz -N 172.16.4.5:4000

Import TiDB Ansible cluster

Note:

Currently, TiUP cluster's support for TiSpark is still experimental. It is not supported to import a TiDB cluster with TiSpark enabled.

Before TiUP is released, TiDB Ansible is often used to deploy TiDB clusters. To enable TiUP to take over the cluster deployed by TiDB Ansible, use the import command.

The usage of the import command is as follows:

{{< copyable "shell-root" >}}

tiup cluster import --help
Import an exist TiDB cluster from TiDB-Ansible

Usage:
  cluster import [flags]

Flags:
  -d, --dir string         The path to TiDB-Ansible directory
  -h, --help               help for import
      --inventory string   The name of inventory file (default "inventory.ini")
      --no-backup          Don't backup ansible dir, useful when there're multiple inventory files
  -r, --rename NAME        Rename the imported cluster to NAME

Global Flags:
      --ssh string        (Experimental) The executor type. Optional values are 'builtin', 'system', and 'none'.
      --wait-timeout int  Timeout of waiting the operation
      --ssh-timeout int   Timeout in seconds to connect host via SSH, ignored for operations that don't need an SSH connection. (default 5)
  -y, --yes               Skip all confirmations and assumes 'yes'

You can use either of the following commands to import a TiDB Ansible cluster:

{{< copyable "shell-regular" >}}

cd tidb-ansible
tiup cluster import

{{< copyable "shell-regular" >}}

tiup cluster import --dir=/path/to/tidb-ansible

View the operation log

To view the operation log, use the audit command. The usage of the audit command is as follows:

Usage:
  tiup cluster audit [audit-id] [flags]

Flags:
  -h, --help   help for audit

If the [audit-id] flag is not specified, the command shows a list of commands that have been executed. For example:

{{< copyable "shell-regular" >}}

tiup cluster audit
Starting component `cluster`: /home/tidb/.tiup/components/cluster/v1.12.3/cluster audit
ID      Time                       Command
--      ----                       -------
4BLhr0  2024-01-25T23:55:09+08:00  /home/tidb/.tiup/components/cluster/v1.12.3/cluster deploy test v7.6.0 /tmp/topology.yaml
4BKWjF  2024-01-25T23:36:57+08:00  /home/tidb/.tiup/components/cluster/v1.12.3/cluster deploy test v7.6.0 /tmp/topology.yaml
4BKVwH  2024-01-25T23:02:08+08:00  /home/tidb/.tiup/components/cluster/v1.12.3/cluster deploy test v7.6.0 /tmp/topology.yaml
4BKKH1  2024-01-25T16:39:04+08:00  /home/tidb/.tiup/components/cluster/v1.12.3/cluster destroy test
4BKKDx  2024-01-25T16:36:57+08:00  /home/tidb/.tiup/components/cluster/v1.12.3/cluster deploy test v7.6.0 /tmp/topology.yaml

The first column is audit-id. To view the execution log of a certain command, pass the audit-id of a command as the flag as follows:

{{< copyable "shell-regular" >}}

tiup cluster audit 4BLhr0

Run commands on a host in the TiDB cluster

To run command on a host in the TiDB cluster, use the exec command. The usage of the exec command is as follows:

Usage:
  cluster exec <cluster-name> [flags]

Flags:
      --command string   the command run on cluster host (default "ls")
  -h, --help             help for exec
  -N, --node strings     Only exec on host with specified nodes
  -R, --role strings     Only exec on host with specified roles
      --sudo             use root permissions (default false)

Global Flags:
      --ssh-timeout int   Timeout in seconds to connect host via SSH, ignored for operations that don't need an SSH connection. (default 5)
  -y, --yes               Skip all confirmations and assumes 'yes'

For example, to execute ls /tmp on all TiDB nodes, run the following command:

{{< copyable "shell-regular" >}}

tiup cluster exec test-cluster --command='ls /tmp'

Cluster controllers

Before TiUP is released, you can control the cluster using tidb-ctl, tikv-ctl, pd-ctl, and other tools. To make the tools easier to download and use, TiUP integrates them into an all-in-one component, ctl.

Usage:
  tiup ctl:v<CLUSTER_VERSION> {tidb/pd/tikv/binlog/etcd} [flags]

Flags:
  -h, --help   help for tiup

This command has a corresponding relationship with those of the previous tools:

tidb-ctl [args] = tiup ctl tidb [args]
pd-ctl [args] = tiup ctl pd [args]
tikv-ctl [args] = tiup ctl tikv [args]
binlogctl [args] = tiup ctl bindlog [args]
etcdctl [args] = tiup ctl etcd [args]

For example, if you previously view the store by running pd-ctl -u http://127.0.0.1:2379 store, now you can run the following command in TiUP:

{{< copyable "shell-regular" >}}

tiup ctl:v<CLUSTER_VERSION> pd -u http://127.0.0.1:2379 store

Environment checks for target machines

You can use the check command to perform a series of checks on the environment of the target machine and output the check results. By executing the check command, you can find common unreasonable configurations or unsupported situations. The command flag list is as follows:

Usage:
  tiup cluster check <topology.yml | cluster-name> [flags]
Flags:
      --apply                  Try to fix failed checks
      --cluster                Check existing cluster, the input is a cluster name.
      --enable-cpu             Enable CPU thread count check
      --enable-disk            Enable disk IO (fio) check
      --enable-mem             Enable memory size check
  -h, --help                   help for check
  -i, --identity_file string   The path of the SSH identity file. If specified, public key authentication will be used.
  -p, --password               Use password of target hosts. If specified, password authentication will be used.
      --user string            The user name to login via SSH. The user must has root (or sudo) privilege.

By default, this command is used to check the environment before deployment. By specifying the --cluster flag to switch the mode, you can also check the target machines of an existing cluster, for example:

# check deployed servers before deployment
tiup cluster check topology.yml --user tidb -p
# check deployed servers of an existing cluster
tiup cluster check <cluster-name> --cluster

The CPU thread count check, memory size check, and disk performance check are disabled by default. For the production environment, it is recommended that you enable the three checks and make sure they pass to obtain the best performance.

  • CPU: If the number of threads is greater than or equal to 16, the check is passed.
  • Memory: If the total size of physical memory is greater than or equal to 32 GB, the check is passed.
  • Disk: Execute fio test on the partitions of data_dir and record the results.

When running the checks, if the --apply flag is specified, the program automatically repairs the failed items. Automatic repair is limited to some items that can be adjusted by modifying the configuration or system parameters. Other unrepaired items need to be handled manually according to the actual situation.

Environment checks are not necessary for deploying a cluster. For the production environment, it is recommended to perform environment checks and pass all check items before deployment. If not all the check items are passed, the cluster might be deployed and run normally, but the best performance might not be obtained.

Use the system's native SSH client to connect to cluster

All operations above performed on the cluster machine use the SSH client embedded in TiUP to connect to the cluster and execute commands. However, in some scenarios, you might also need to use the SSH client native to the control machine system to perform such cluster operations. For example:

  • To use a SSH plug-in for authentication
  • To use a customized SSH client

Then you can use the --ssh=system command-line flag to enable the system-native command-line tool:

  • Deploy a cluster: tiup cluster deploy <cluster-name> <version> <topo> --ssh=system. Fill in the name of your cluster for <cluster-name>, the TiDB version to be deployed (such as v7.6.0) for <version>, and the topology file for <topo>.
  • Start a cluster: tiup cluster start <cluster-name> --ssh=system
  • Upgrade a cluster: tiup cluster upgrade ... --ssh=system

You can add --ssh=system in all cluster operation commands above to use the system's native SSH client.

To avoid adding such a flag in every command, you can use the TIUP_NATIVE_SSH system variable to specify whether to use the local SSH client:

export TIUP_NATIVE_SSH=true
# or
export TIUP_NATIVE_SSH=1
# or
export TIUP_NATIVE_SSH=enable

If you specify this environment variable and --ssh at the same time, --ssh has higher priority.

Note:

During the process of cluster deployment, if you need to use a password for connection (-p) or passphrase is configured in the key file, you must ensure that sshpass is installed on the control machine; otherwise, a timeout error is reported.

Migrate control machine and back up TiUP data

The TiUP data is stored in the .tiup directory in the user's home directory. To migrate the control machine, you can take the following steps to copy the .tiup directory to the corresponding target machine:

  1. Execute tar czvf tiup.tar.gz .tiup in the home directory of the original machine.

  2. Copy tiup.tar.gz to the home directory of the target machine.

  3. Execute tar xzvf tiup.tar.gz in the home directory of the target machine.

  4. Add the .tiup directory to the PATH environment variable.

    If you use bash and you are a tidb user, you can add export PATH=/home/tidb/.tiup/bin:$PATH in ~/.bashrc and execute source ~/.bashrc. Then make corresponding adjustments according to the shell and the user you use.

Note:

It is recommended that you back up the .tiup directory regularly to avoid the loss of TiUP data caused by abnormal conditions, such as disk damage of the control machine.

Back up and restore meta files for cluster deployment and O&M

If the meta files used for operation and maintenance (O&M) are lost, managing the cluster using TiUP will fail. It is recommended that you back up the meta files regularly by running the following command:

tiup cluster meta backup ${cluster_name}

If the meta files are lost, you can restore them by running the following command:

tiup cluster meta restore ${cluster_name} ${backup_file}

Note:

The restore operation overwrites the current meta files. Therefore, it is recommended to restore the meta files only when they are lost.