Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Feature: fork kernel #410

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

maartenbreddels
Copy link
Contributor

Issue

For dashboarding, it is useful to have a kernel + the final state ready instantly. If for instance, it takes ~5 minutes to execute a notebook, it does not lead to good user experience.

Solution

Execute a notebook, and after it is done, consider it a template, and give each user a fork() of this template process.

Implementation

The kernel gets a new message, which is simply called fork. Asyncio/tornado-ioloop with fork is a no-go, it is basically not supported. To avoid this, we stop the current ioloop, and use the ioloop object to communicate the fork request up the callstack. The kernelapp now checks if a fork was requested and it will fork:

  • The parent resumes the ioloop
  • The child should clean up the ioloop and start listening on new ports:

Usage (although it does not work yet)

It also needs this PR: jupyter/jupyter_client#441

Shell1:

$ python -m ipykernel_launcher -f conn.json --debug

Shell2:

$ python fork_kernel.py ./conn.json # this will print the result of the fork
DEBUG:traitlets:Loading connection file ./conn.json
DEBUG:traitlets:connecting shell channel to tcp://127.0.0.1:56573
DEBUG:traitlets:Connecting to: tcp://127.0.0.1:56573
DEBUG:traitlets:connecting iopub channel to tcp://127.0.0.1:56574
DEBUG:traitlets:Connecting to: tcp://127.0.0.1:56574
DEBUG:traitlets:connecting stdin channel to tcp://127.0.0.1:56575
DEBUG:traitlets:Connecting to: tcp://127.0.0.1:56575
DEBUG:traitlets:connecting heartbeat channel to tcp://127.0.0.1:56577
{'header': {'msg_id': '81e4a9b2-155b535e6541ce736c126d47', 'msg_type': 'execute_reply', 'username': 'maartenbreddels', 'session': 'b5b03194-ba9276d2bbc5a711d5e5d44c', 'date': datetime.datetime(2019, 5, 28, 21, 39, 2, 604627, tzinfo=tzutc()), 'version': '5.3'}, 'msg_id': '81e4a9b2-155b535e6541ce736c126d47', 'msg_type': 'execute_reply', 'parent_header': {'msg_id': '73914843-0da368d574d8c114b8dfd531', 'msg_type': 'fork', 'username': 'maartenbreddels', 'session': '9c61f20c-d8a4e43f945eaf915031a3d4', 'date': datetime.datetime(2019, 5, 28, 21, 39, 2, 596944, tzinfo=tzutc()), 'version': '5.3'}, 'metadata': {'status': 'ok'}, 'content': {'status': 'ok', 'fork_id': 1042}, 'buffers': []}
$ python fork_kernel.py  /Users/maartenbreddels/Library/Jupyter/runtime/conn_fork.json
DEBUG:traitlets:Loading connection file /Users/maartenbreddels/Library/Jupyter/runtime/conn_fork.json
DEBUG:traitlets:connecting shell channel to tcp://127.0.0.1:54452
DEBUG:traitlets:Connecting to: tcp://127.0.0.1:54452
DEBUG:traitlets:connecting iopub channel to tcp://127.0.0.1:54455
DEBUG:traitlets:Connecting to: tcp://127.0.0.1:54455
DEBUG:traitlets:connecting stdin channel to tcp://127.0.0.1:54453
DEBUG:traitlets:Connecting to: tcp://127.0.0.1:54453
DEBUG:traitlets:connecting heartbeat channel to tcp://127.0.0.1:54457
^[[A^CTraceback (most recent call last):
  File "fork_kernel.py", line 12, in <module>
    client.wait_for_ready(100)
  File "/Users/maartenbreddels/src/jupyter/jupyter_client/jupyter_client/blocking/client.py", line 111, in wait_for_ready
    msg = self.shell_channel.get_msg(block=True, timeout=1)
  File "/Users/maartenbreddels/src/jupyter/jupyter_client/jupyter_client/blocking/channels.py", line 50, in get_msg
    ready = self.socket.poll(timeout)
  File "/Users/maartenbreddels/miniconda3/envs/kernel_fork/lib/python3.7/site-packages/zmq/sugar/socket.py", line 697, in poll
    evts = dict(p.poll(timeout))
  File "/Users/maartenbreddels/miniconda3/envs/kernel_fork/lib/python3.7/site-packages/zmq/sugar/poll.py", line 99, in poll
    return zmq_poll(self.sockets, timeout=timeout)
  File "zmq/backend/cython/_poll.pyx", line 123, in zmq.backend.cython._poll.zmq_poll
  File "zmq/backend/cython/checkrc.pxd", line 12, in zmq.backend.cython.checkrc._check_rc
KeyboardInterrupt

The last command never does anything, so I ctrl-c'ed it, so the stacktrace is included.

Problem

It doesn't work 😄. The forked child process doesn't seem to be reachable by zmq, and no debug info is printed, so I'm giving up here, and hope that someone else has good ideas or wants to pick this up.

Side uses

Forking the kernel could be used for 'undoing' the excution of a cell/line, by forking after each execute(..). Note that since fork uses copy on write, this will be fairly cheap in resource usage.

@LucianaMarques
Copy link

Hey @maartenbreddels ! I'm interested in helping you with this.

I don't quite get what is the strategy here... Decreasing run time can indeed be done with more than one process, but you did mention this:

Execute a notebook, and after it is done, consider it a template, and give each user a fork() of this template process.

If you are waiting for it to be done, then we are waiting for the whole run time, right? Could you please be more specific about this idea?

It does sound interesting, though! :)

@edisongustavo
Copy link
Contributor

Hello @maartenbreddels, I'm very interested in this feature.

Is there a way I can help on how we can implement this together?

@maartenbreddels
Copy link
Contributor Author

I currently don't have much time to work on this, but I'm happy for you to take it over a bit, or test it out.

@edisongustavo
Copy link
Contributor

edisongustavo commented Aug 25, 2019

Hey @maartenbreddels, I believe I found out why the child kernel does not receive the messages.

That's due to the check_pid flag of Session in the send() method:

class Session:
    def send(...):
        ...
        if self.check_pid and not os.getpid() == self.pid:
            get_logger().warning("WARNING: attempted to send message from fork\n%s",
                msg
            )
            return        

See here

If I set check_pid = False, then the following works (This is a modification of your fork_kernel.py script:

import itertools
import sys
import time

from jupyter_client import BlockingKernelClient
import logging
logging.basicConfig()

connection_file = sys.argv[1]

client = BlockingKernelClient()
client.log.setLevel('DEBUG')
client.load_connection_file(sys.argv[1])
client.start_channels()
client.wait_for_ready(100)

obj = client.fork()
msg = client.shell_channel.get_msg(timeout=100)

print("=========================================")
print(" Client after fork")
print("=========================================")

client_after_fork = BlockingKernelClient()
client_after_fork.log.setLevel('DEBUG')
client_after_fork.load_connection_info(msg["content"]["conn"])
client_after_fork.start_channels()
client_after_fork.wait_for_ready(100)

client_after_fork.execute(code="foo = 5+5", user_expressions={"my-user-expresssion": "foo"})
msg = client_after_fork.get_shell_msg(timeout=100)
print("Value of last execution is {}".format(msg["content"]["user_expressions"]["my-user-expresssion"]["data"]["text/plain"]))

client_after_fork.execute(code="bar = 1+2", user_expressions={"my-user-expresssion": "foo,bar"})
msg = client_after_fork.get_shell_msg(timeout=100)
print("Value of last execution is {}".format(msg["content"]["user_expressions"]["my-user-expresssion"]["data"]["text/plain"]))

Will print to the console:

DEBUG:traitlets:Loading connection file conn.json
DEBUG:traitlets:connecting shell channel to tcp://127.0.0.1:56573
DEBUG:traitlets:Connecting to: tcp://127.0.0.1:56573
DEBUG:traitlets:connecting iopub channel to tcp://127.0.0.1:56574
DEBUG:traitlets:Connecting to: tcp://127.0.0.1:56574
DEBUG:traitlets:connecting stdin channel to tcp://127.0.0.1:56575
DEBUG:traitlets:Connecting to: tcp://127.0.0.1:56575
DEBUG:traitlets:connecting heartbeat channel to tcp://127.0.0.1:56577
DEBUG:traitlets:connecting shell channel to tcp://127.0.0.1:48419
DEBUG:traitlets:Connecting to: tcp://127.0.0.1:48419
DEBUG:traitlets:connecting iopub channel to tcp://127.0.0.1:52073
DEBUG:traitlets:Connecting to: tcp://127.0.0.1:52073
DEBUG:traitlets:connecting stdin channel to tcp://127.0.0.1:50093
DEBUG:traitlets:Connecting to: tcp://127.0.0.1:50093
DEBUG:traitlets:connecting heartbeat channel to tcp://127.0.0.1:60527
=========================================
 Client after fork
=========================================
Value of last execution is 10
Value of last execution is (10, 3)

@edisongustavo
Copy link
Contributor

edisongustavo commented Aug 25, 2019

I've also tried connecting to a "forked" kernel and it does work!

@maartenbreddels
Copy link
Contributor Author

maartenbreddels commented Aug 25, 2019 via email

@edisongustavo
Copy link
Contributor

edisongustavo commented Aug 25, 2019

Testing shutdown()

So I was testing if the shutdown() message would work and it doesn't out of the box.

I found out that the shell variable must also be updated, so I modified the block where you reset the kernel to this:

# NOTE: we actually start a new kernel, but once this works
# we can actually think about reusing the kernel object
self.kernel_class.clear_instance()
self.kernel.shell_class.clear_instance()

Then shutdown() does work correctly.

@edisongustavo
Copy link
Contributor

I'm working now on getting to reuse the Kernel, I've forked the repo here: https://github.com/edisongustavo/ipykernel/, I believe it will take some time now, since I can only work on this on my free time.

But I'm getting there!

edisongustavo and others added 7 commits September 2, 2019 22:24
By spawning subprocesses with `stdout = open('/dev/null')` then PyCharm
is not able to attach to them.

This is specially painful when debugging the tests in test_kernel(),
where the methods `kernel()` do spawn kernels in subprocesses if
required.
This is the directory created by PyCharm
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants