Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Noting that process scheduling in ckb-vm is deterministic #445

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

mohanson
Copy link
Contributor

@mohanson mohanson commented Oct 31, 2024

The developer may suspect that there will be randomness in the implementation of spawn, so this sentence is added intentionally.

@mohanson mohanson requested a review from a team as a code owner October 31, 2024 01:08
@mohanson
Copy link
Contributor Author

cc @xxuejie @XuJiandong

@xxuejie
Copy link
Contributor

xxuejie commented Oct 31, 2024

I recommmend expanding this section a bit:

  • For each hardfork version, the process scheduling will be deterministic, any indeterminism will be treated as critical / security bugs that requires immediate intervention
  • However, based on real usage on chain, it is expected that future hardfork versions would improve the process scheduling workflow, hence making the behavior different across versions

Simply put, the scheduler included in CKB will be deterministic with respect to a particular hardfork version, and will likely change its behavior in different hardfork versions.

@XuJiandong
Copy link
Contributor

Provide detailed information about the deterministic scheduler, including:

  1. The criteria for selecting the next process to run when one process is blocked.
  2. The conditions under which deadlock can occur.
  3. The specific operations that can cause a process to become blocked.

@xxuejie
Copy link
Contributor

xxuejie commented Oct 31, 2024

Provide detailed information about the deterministic scheduler, including:

  1. The criteria for selecting the next process to run when one process is blocked.
  2. The conditions under which deadlock can occur.
  3. The specific operations that can cause a process to become blocked.

I'm not entirely sure here: are those information suitable for RFC, or separate documentations?

An analogy here, could be the details of the transaction pool, those are also highly tied to actual implementation, yet the RFCs does not fully describe those.

FYI: I'm not against adding those information, they certainly help, but it is worth discussing where they should fit.

@mohanson
Copy link
Contributor Author

I suggest putting the detailed working principle of the scheduler in a separate document.

@xxuejie
Copy link
Contributor

xxuejie commented Oct 31, 2024

A second thought: I think it makes sense to document the deadlock conditions for each hardfork versions here; but other details fit better in a separate document on Nervos docs.

@mohanson
Copy link
Contributor Author

A second thought: I think it makes sense to document the deadlock conditions for each hardfork versions here; but other details fit better in a separate document on Nervos docs.

There is already descriptions of when read/write/wait will cause deadlock

https://github.com/nervosnetwork/rfcs/blob/master/rfcs/0050-vm-syscalls-3/0050-vm-syscalls-3.md#write
https://github.com/nervosnetwork/rfcs/blob/master/rfcs/0050-vm-syscalls-3/0050-vm-syscalls-3.md#error-code

@xxuejie
Copy link
Contributor

xxuejie commented Oct 31, 2024

Yes but the words are quite vague:

If ckb-vm detects that all processes are blocked, ckb-vm will return a deadlock error.
It's possible for read/write/wait operations to wait for each other, leading to a deadlock state.

I think it makes sense to specifically define when all process will be blocked, and what it means for operations to wait for each other.

@mohanson
Copy link
Contributor Author

mohanson commented Nov 1, 2024

I added a separate section to describe deadlocks, please check @xxuejie @XuJiandong

XuJiandong
XuJiandong previously approved these changes Nov 14, 2024
Deadlock is a situation where two or more processes are unable to proceed because they are each waiting for resources or conditions that can only be provided by another waiting process. In the context of this scheduler, where processes communicate via pipes and can enter various states, such as `Runnable`, `Running`, `Terminated`, `WaitForExit`, `WaitForRead`, `WaitForWrite`. In our scheduler, deadlock will occur if all unterminated processes are waiting and no process is in a runnable state.

- The process enters the `Runnable` when a process is created, or it get returned from `wait()`, `write()` and `read()`.
- The process enters the `Running` when a process is running.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when a process starts running


Deadlock is a situation where two or more processes are unable to proceed because they are each waiting for resources or conditions that can only be provided by another waiting process. In the context of this scheduler, where processes communicate via pipes and can enter various states, such as `Runnable`, `Running`, `Terminated`, `WaitForExit`, `WaitForRead`, `WaitForWrite`. In our scheduler, deadlock will occur if all unterminated processes are waiting and no process is in a runnable state.

- The process enters the `Runnable` when a process is created, or it get returned from `wait()`, `write()` and `read()`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personal rephrase here: it's not that a particular syscall returned, it's really that the blocking condition for a process is resolved, so the process enters runnable state.

If still confused, I recommend looking into literature for operating systems.

- The process enters the `Runnable` when a process is created, or it get returned from `wait()`, `write()` and `read()`.
- The process enters the `Running` when a process is running.
- The process enters the `Terminated` when a process is terminated.
- The process enters the `WaitForExit` state by calling the `wait()`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply calling wait won't necessarily trigger the process to enter WaitForExit state, it really is that when process a calls wait on process b, but process b is still running. In other words, process a now has a blocking condition.

- The process enters the `Running` when a process is running.
- The process enters the `Terminated` when a process is terminated.
- The process enters the `WaitForExit` state by calling the `wait()`
- The process enters the `WaitForRead` state by calling the `read()`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the above, a process might not actually enter WaitForRead state by calling read, if data are already available at the other end. It only enters this state when it wants data but data are not ready, in other words, it has a blocking condition.

- The process enters the `Terminated` when a process is terminated.
- The process enters the `WaitForExit` state by calling the `wait()`
- The process enters the `WaitForRead` state by calling the `read()`
- The process enters the `WaitForWrite` state by calling the `write()`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the above

0. **Circular Waiting**: If multiple processes are in the `Wait`, `WaitForWrite`, or `WaitForRead` states and are waiting on each other in a circular dependency, a deadlock can occur. For example, if:
- Process A is in `WaitForRead` for data from Process B
- Process B is in `WaitForRead` for data from Process A. Both processes will wait indefinitely, as each is waiting for the other to proceed.
0. **Buffer Limits**: Essentially, it's another circular waiting. The pipe in ckb-vm is unbuffered. If one process blocks on a `WaitForWrite` state because the data is not fully read, and the reader process is also blocked in a `WaitForRead` state (but on a different file descriptor), this can create a deadlock if neither can proceed:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It really depends on perspective, to me, this is another example of circular dependency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants