Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Upgrades] Document/address issues found during v0.0.11 upgrade on Alpha TestNet #990

Open
7 tasks
okdas opened this issue Dec 10, 2024 · 0 comments
Open
7 tasks
Assignees
Labels
code health Cleans up some code documentation Improvements or additions to documentation

Comments

@okdas
Copy link
Member

okdas commented Dec 10, 2024

Objective

Address issues and future-proof the network upgrade process to prevent panics and data incompatibility that were encountered during the v0.0.11 upgrade.

Origin Document

https://www.notion.so/buildwithgrove/Alpha-TestNet-v0-0-11-Postmortem-152a36edfff680d58411c8fa384f877c?pvs=4

Goals

  • Ensure that future upgrades do not cause runtime panics due to missing parameters or breaking protobuf changes.
  • Enhance configuration and upgrade planning to handle missing app.toml parameters gracefully.
  • Strengthen the upgrade process to handle protobuf field changes more robustly.
  • Improve cosmovisor instructions to facilitate rollback if upgrades fail.

Deliverables

  • Configuration Propagation:
    Ensure default custom app.toml values are propagated automatically.

    • Add default values for poktroll.telemetry.
    • Change the pattern so newly added config fields do not cause panics if absent.
  • Binary SHA Inclusion Decision:
    Exclude SHAs from upgrade transactions, but keep that option available.

    • Document pros and cons (security vs. ability to replace binaries mid-upgrade).
    • Default to NO SHAs for now.
  • Upgrade Handler Practices:
    Avoid performing GetParams in the upgrade handler where protobuf structures might have changed.

    • Explicitly set all parameters in code during upgrade to avoid decoding unknown fields, unless we have to get existing parameters.
    • Update code comments/examples/references and documentation to reflect this best practice.
  • Cosmovisor Rollback Instructions:
    Improve documentation and instructions for operators to handle failed upgrades.

    • Instruct how to remove the new version and change symlinks back to the old binary.
    • Instruct how to remove upgrade-info.json to prevent automatic re-upgrade attempts.

Non-goals / Non-deliverables

  • Re-running the v0.0.11 upgrade on the current testnet (already failed, rollback handled separately).

General deliverables

  • Comments: Add TODOs and comments in code to guide future developers.
  • Testing: Add unit/E2E tests for upgrade scenarios, especially for config and protobuf changes.
  • Documentation: Update developer and operator docs to reflect new practices, including code and configuration examples.

Creator: @okdas
Co-owners: @bryanchriswhite @Olshansk

@okdas okdas added documentation Improvements or additions to documentation code health Cleans up some code labels Dec 10, 2024
@okdas okdas added this to the Shannon Beta TestNet Support milestone Dec 10, 2024
@okdas okdas self-assigned this Dec 10, 2024
@okdas okdas added this to Shannon Dec 10, 2024
@Olshansk Olshansk moved this to 🔖 Ready in Shannon Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code health Cleans up some code documentation Improvements or additions to documentation
Projects
Status: 🔖 Ready
Development

No branches or pull requests

2 participants