fma: Draft FMA for MT/64-bit Cannon

ethereum-optimism · Oct 11, 2024 · eded6d0 · eded6d0
1 parent 165efce
commit eded6d0
Showing 1 changed file with 128 additions and 0 deletions.
diff --git a/security/fma-multi-threaded-64-bit-cannon.md b/security/fma-multi-threaded-64-bit-cannon.md
@@ -0,0 +1,128 @@
+# [Project Name]: Failure Modes and Recovery Path Analysis
+
+<!-- START doctoc generated TOC please keep comment here to allow auto update -->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+
+- [Introduction](#introduction)
+- [Failure Modes and Recovery Paths](#failure-modes-and-recovery-paths)
+  - [[Name of Failure Mode 1]](#name-of-failure-mode-1)
+  - [[Name of Failure Mode 2]](#name-of-failure-mode-2)
+- [Audit Requirements](#audit-requirements)
+- [Action Items](#action-items)
+- [Appendix](#appendix)
+  - [Appendix A: This is a Placeholder Title](#appendix-a-this-is-a-placeholder-title)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+*Italics are used to indicate things that need to be replaced.*
+
+| | |
+|--------|--------------|
+| Author | *Author Name* |
+| Created at | *2024-10-09* |
+| Initial Reviewers | *Reviewer Name 1, Reviewer Name 2* |
+| Need Approval From | *Security Reviewer Name* |
+| Status | Draft |
+
+> [!NOTE]
+> 📢 Remember:
+>
+> - The single approver in the “Need Approval From” must be from the Security team. 
+> - Maintain the “Status” property accordingly. An FMA document can have the following statuses:
+>   - **Draft 📝:** Doc is created but not yet ready for review.
+>   - **In Review 🔎:** Security is reviewing, and Engineering is iterating on the design. A checklist of action items will be created during this phase.
+>   - **Implementing Actions 🛫:** Security has signed off on the content of the document, including the resulting action items. Engineering is responsible for implementing the the action items, and updating the checklist.
+>   - **Final 👍:** Security will transition the status of the document to Final once all action items are completed.
+
+> [!TIP]
+> Guidelines for writing a good analysis, and what the reviewer will look for:
+>
+> - Show your work: Include steps and tools for each conclusion.
+> - Completeness of risks considered.
+> - Include both implementation and operational failure modes
+> - Provide references to support the reviewer.
+> - The size of the document will likely be proportional to the project's complexity.
+> - The ultimate goal of this document is to identify action items to improve the security of the  project. The FMA review process can be accelerated by proactively identifying action items during the writing process.
+
+## Introduction
+
+This document covers the conversion of the [Cannon Fault Proof VM](https://docs.optimism.io/stack/protocol/fault-proofs/cannon) to support multi-threading and 64-bit architecture. This increases addressable memory and allows better memory management with garbage collection.
+
+
+## Failure Modes and Recovery Paths
+
+### Unimplemented syscalls or opcodes needed by `op-program`
+
+- **Description:** We only aim to implement syscalls and opcodes that are required by `op-program` so there are some unimplemented. The risk is that there is some previously untested code path that uses an opcode or syscall that we haven't implemented and this code path ends up being exercised by an input condition some time in the future.
+- **Risk Assessment:** *Simple low/medium/high rating of impact (severity) + likelihood.*
+- **Mitigations:** *What mitigations are in place, or what should we add, to reduce the chance of this occurring?*
+- **Detection:** *How do we detect if this occurs?*
+- **Recovery Path(s)**: *How do we resolve this? Is it a simple, quick recovery or a big effort? Would recovery require a governance vote or a hard fork?*
+
+### Failure to run correct VM based on prestate input
+
+- **Description:** The off-chain Cannon [attempts to run the correct VM version based on the prestate input](https://github.com/ethereum-optimism/design-docs/pull/88/files). If it doesn't work correctly the on-chain steps would not match.
+- **Risk Assessment:** *Simple low/medium/high rating of impact (severity) + likelihood.*
+- **Mitigations:** *What mitigations are in place, or what should we add, to reduce the chance of this occurring?*
+- **Detection:** *How do we detect if this occurs?*
+- **Recovery Path(s)**: *How do we resolve this? Is it a simple, quick recovery or a big effort? Would recovery require a governance vote or a hard fork?*
+
+### Mismatch between on-chain and off-chain execution
+
+- **Description:** There could be bugs in the implementation of either the Solidity or Go versions that make them incompatible with each other.
+- **Risk Assessment:** *Simple low/medium/high rating of impact (severity) + likelihood.*
+- **Mitigations:** *What mitigations are in place, or what should we add, to reduce the chance of this occurring?*
+- **Detection:** *How do we detect if this occurs?*
+- **Recovery Path(s)**: *How do we resolve this? Is it a simple, quick recovery or a big effort? Would recovery require a governance vote or a hard fork?*
+
+### Livelocks in the fault proof
+
+- **Description:** *Details of the failure mode go here. What the causes and effects of this failure?*
+- **Risk Assessment:** *Simple low/medium/high rating of impact (severity) + likelihood.*
+- **Mitigations:** *What mitigations are in place, or what should we add, to reduce the chance of this occurring?*
+- **Detection:** *How do we detect if this occurs?*
+- **Recovery Path(s)**: *How do we resolve this? Is it a simple, quick recovery or a big effort? Would recovery require a governance vote or a hard fork?*
+
+### Execution traces too long for the fault proof
+
+- **Description:** *Details of the failure mode go here. What the causes and effects of this failure?*
+- **Risk Assessment:** *Simple low/medium/high rating of impact (severity) + likelihood.*
+- **Mitigations:** *What mitigations are in place, or what should we add, to reduce the chance of this occurring?*
+- **Detection:** *How do we detect if this occurs?*
+- **Recovery Path(s)**: *How do we resolve this? Is it a simple, quick recovery or a big effort? Would recovery require a governance vote or a hard fork?*
+
+### [Name of Failure Mode]
+
+- **Description:** *Details of the failure mode go here. What the causes and effects of this failure?*
+- **Risk Assessment:** *Simple low/medium/high rating of impact (severity) + likelihood.*
+- **Mitigations:** *What mitigations are in place, or what should we add, to reduce the chance of this occurring?*
+- **Detection:** *How do we detect if this occurs?*
+- **Recovery Path(s)**: *How do we resolve this? Is it a simple, quick recovery or a big effort? Would recovery require a governance vote or a hard fork?*
+
+
+### Generic items we need to take into account:
+See [./fma-generic-hardfork.md](./fma-generic-hardfork.md). 
+
+- [ ] Check this box to confirm that these items have been considered and updated if necessary.
+
+
+## Action Items
+
+Below is what needs to be done before launch to reduce the chances of the above failure modes occurring, and to ensure they can be detected and recovered from:
+
+- [ ] Resolve all comments on this document and incorporate them into the document itself (Assignee: document author)
+- [ ] *Action item 2 (Assignee: tag assignee)*
+- [ ] *Action item 3 (Assignee: tag assignee)*
+
+## Audit Requirements
+
+*Given the failure modes and action items, will this project require an audit? See [OP Labs Audit Framework: When to get external security review and how to prepare for it](https://gov.optimism.io/t/op-labs-audit-framework-when-to-get-external-security-review-and-how-to-prepare-for-it/6864) for a reference decision making framework. Please explain your reasoning.*
+
+## Appendix
+
+### Appendix A: This is a Placeholder Title
+
+*Appendices must include any additional relevant info, processes, or documentation that is relevant for verifying and reproducing the above info. Examples:*
+
+- *If you used certain tools, specify their versions or commit hashes.*
+- *If you followed some process/procedure, document the steps in that process or link to somewhere that process is defined.*