All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
- Add
backup_instance_dirs
step to archive files of stopped instance - Add
cartridge_restore_backup_path_local
to restore instance from local backup
- Remove old app configurations before uploading a new one
- Allow downgrading RPM and DEB packages
- Ignore disabled instances when counting disabled instances
- Add disabled instances to
single_instances_for_each_machine
variable
- Add
cartridge_log_dir_parent
to configure directory of logs - Add
cartridge_force_leader_control_instance
variable to choose a control instance among the leaders - Add
cartridge_app_config_upload_http_timeout
variable to configure timeout to wait config upload in HTTP mode.
- Optimize
Set instance facts
step - Optimize facts caching
- Fixed the ability to roll back to the previous TGZ package
- Fixed backup folder permissions
- Handle empty values in
helpers.py
- Fixed templates of systemd units for TGZ packages
- The twophase timeouts is used in the
upload_app_config
step
- Step
cleanup_instance_files
to clean up data of stopped instance - Add availability to set environment variables for instance service
- Add
instances_from_same_machine
variable in preparation - Add
check_new_topology
step to compare inventory and real cluster topology - Availability to disable instances via
disabled
flag - Add
backup
,backup_start
,backup_stop
andrestore
steps to back up and restore instances
- Hosts uniqueness considers
ansible_port
, not onlyansible_host
- Long facts caching when playbook has two or more role imports
- Now select control instance task ignores bad instances from membership
- Fix instance joining when leader is not first
wait_members_alive
step to wait until all cluster members havealive
status and specified state;wait_cluster_has_no_issues
step to wait until cluster has no issues- Availability to upload ZIP configs to TDG;
cartridge_not_save_cookie_in_app_config
variable that allows to disable persisting cluster cookie in the application configuration file;patch_instance_in_runtime
step to update instance parameters in runtime;- Variables
bootstrap_vshard_retries
,bootstrap_vshard_delay
,connect_to_membership_retries
,connect_to_membership_delay
to change hardcoded values.
- Timeout
instance_start_timeout
(to check that all instances become started) deprecated and replaced withinstance_start_retries
andinstance_start_delay
; - Timeout
instance_discover_buckets_timeout
(to check that instances discover buckets) deprecated and replaced withinstance_discover_buckets_retries
andinstance_discover_buckets_delay
.
- Fail on getting control instance when all unjoined instances haven't
replicaset_alias
set; - Support of Ansible 4.0;
- Handling of bad membership members - empty or with empty payload.
- Running the role with python 2.7
- Skipping instances restart when package was updated, but configuration wasn't
- Missing default config for machine with stateboard
- Specifying
cartridge_app_name
other than the TGZ package name - Creating unnamed replicasets with instances without
replicaset_alias
set - Getting control instance:
- Now one not expelled instance should also be alive; it's checked by creating connection using instances advertise URIs
- Control instance should be alive
- If there are some joined instances, but no one of them isn't alive, getting control instance fails.
set_control_instance
is improved to consider non-joined instance statusedit_topology
step now considers roles dependencies, permanent and hidden roles and don't perform unnecessary calls if enabled roles list isn't changed
failover_promote
step to promote replicasets leaders- Allowed to skip user and group creation for tgz
- Debug control instance and one not expelled instance
- Timeouts for two-phase commits:
twophase_netbox_call_timeout
twophase_upload_config_timeout
twophase_apply_config_timeout
eval
andeval_on_control_instance
steps to eval code on instances- Step
stop_instance
to stop and disable instance systemd service - Step
start_instance
to start and enable instance systemd service - Step
restart_instance_force
to restart systemd service without any conditions - New
cartridge_failover_params
fields:failover_timeout
fencing_enabled
fencing_timeout
fencing_pause
edit_topology_allow_missed_instances
variable to allow replicasets containing the instances that are not started yetupload_app_config
step to load the file or directory config (Cartridge and TDG are supported)
- Timeout to wait for cluster health after topology editing
renamed from
edit_topology_timeout
toedit_topology_healthy_timeout
cartridge_cluster_cookie
now is required only forconfigure_instance
,restart_instance
andupload_app_config
steps
- Now only the necessary information will be transferred in tasks,
which used
hostvars
. Due to this duration of these tasks was reduced.
- Role variables are saved to the dictionary, so they do not affect the next play
- Fix facts setting in
hostvars
fact - Avoid using the
non_expelled_instance
fact name. Now thenot_expelled_instance
name is used everywhere.
- Removing stateboard instance distribution directory on
rotate_dists
step - Fixed fail on getting one non-expelled instance when only stateboard instance is configured
- Fixed compatibility with Ansible 2.9
- Role installation will be completely skipped if you specify a tag other than the tags for this role
- Fixed selecting control instance that doesn't belong to cluster or isn't alive.
The following rules are currently used:
- Members are checked in lexicographic order by URIs
- Members not mentioned in hostvars aren't selected to be control
- Members with status not
alive
aren't selected to be control
- Fixed setting
needs_restart
when configuration files don't exist - Fixed error on configuring auth without users specified
- Reset role variables before each run
cartridge-replicasets
tag to the membership stagecartridge_wait_buckets_discovery
parameter to wait for instance to discover bucketsinstance_discover_buckets_timeout
parameter to configure time in seconds to wait for instance to discover buckets- Ability to deploy TGZ packages
cartridge_multiversion
flag that allows to use specific version of application for each instance and perform rolling update correctly (using newupdate_instance
step)rotate_dists
step that allows to rotate application distributionscleanup
step to remove temporary files from specific list- Added availability to import steps by scenario name. Added some default scenarios. Added availability to create custom scenarios.
- availability to use
tasks_from
to import any step zone
variable to edit instance zoneedit_topology_timeout
variable to wait until cluster become healthy after editing topology- availability to specify instance
memtx_dir
,vinyl_dir
andwal_dir
params bycartridge_memtx_dir_parent
,cartridge_vinyl_dir_parent
,cartridge_wal_dir_parent
variables. - Control instance is selected considering two-phase commit version of instances. The reason is that all operations that modify cluster-wide config should be performed via instance that has lowest Cartridge version (in fact, only two-phase commit version matters).
- Availability to change advertise URIs of any instance
cartridge.admin_edit_topology
is called once for all replicasets and instances to expel. It can be called second time to set up failover priority for replicasets where new instances were joined. As a result,replicaset_healthy_timeout
is removed as unused.- Now list of instances for installing a package is selected once for all. Before this patch, the complexity of calculating the list of instances was O(N^2), now it is O(N). For 100 instances, it gives a 10x time reduction (60s -> 5s).
- Refactored package installing. Getting package info is performed in a library module, all tasks except installing package itself are common for RPM and DEB.
- Now
check_instance_started
function: check all instances, including the stateboard; waitUnconfigured
orRolesConfigured
status insteadalive
state; check that all buckets are discovered by routers if cluster was bootstrapped. - Role divided into many steps (#141). It's possible to combine them using a scenario
in the config by
cartridge_scenario
. It is also possible to use custom steps in a scenario. Custom steps can be defined bycartridge_custom_steps_dir
andcartridge_custom_steps
. - Now step
connect_to_membership
is executed only on one not expelled instance. Before the patch, the difficulty of performingconnect_to_membership
step wasN^2
. For 100 instances, the step took about 900 seconds. Now the complexity has decreased to N, so for 100 instances the execution time is about 5 seconds.
- needs_restart task error for non-bootstrapped instance
replicaset_healthy_timeout
parameter to wait for replicaset to be healthy after editing it
- Managing dynamic
box.cfg
parameters in runtime
restarted: false
to disable instance restart
etcd2
state provider for stateful failover (cartridge >= 2.2.0)
cartridge_failover_params
variable to manage new failover (cartridge >= 2.1.0)stateboard
flag to start Tarantool Stateboard instance (cartridge >= 2.1.0)
any_errors_fatal: true
is set for package installation tasksfalover_priority
parameter is optional
cartridge_failover
variable
- Liitle bugs in python modules
vshard_group
parameter forvshard-storage
replicasets
- Cluster cookie checks
ansible_host
value is used as an unique host identifier instead ofansible_machine_id
cartridge_app_name
is checked to be equal to package name on package installation
- Error on control instance selection
- Store error codes in CartridgeException
- Interpret some errors as a valid behaviour in cartridge_needs_restart and cartridge_instance modules
- Do not try to manage memtx_memory in runtime for expelled
- Fixed "Unable to patch config system section" errmsg
- Increasing memtx_memory without instance restart
restarted
flag to force instance restartexpelled
flag to expel instance from clusterweight
andall_rw
replicaset parameters- Editing existed replicaset
- Tests for debian
instance_start_timeout
parameter to wait for instance to be started
leader
parameter replaced byfailover_priority
- Use
cartridge.admin_edit_topology()
call to manage topology - Test inventory restructured
- Added retry on Vshard bootstrapping
cartridge_app_name
parameter is mandatory now and it isn't rewrited by package info- Installing package tasks are running for one non-expelled instance per machine
- Added missed tags for start_instance tasks
- Fix endless loop for recvall() in case of broken pipe
- Fixed KeyError on joining not started instance to replicaset
- Configure cluster using tarantool console socket instead of HTTP
- Improved Gitlab CI test packages creation
- Use both deb and rpm packages in molecule tests
- Variables structure is changed to interpret instances as Ansible hosts
- Instance connects to membership by probing other instances
- Removed
cartridge_failover
default value - Removed useless unzip installation
- Console eval fixed to find end of output using full ouput data
- DEB packages deployment
- Reloading systemd daemon after package updating
- Getting started
- RPM packages deployment
- instances configuration and starting
- topology configuration
- vhard bootstrapping
- managing failover
- authorization configuration
- molecule tests
- application config patching