Component state in django-unicorn #599

imankulov · 2023-09-14T13:42:46Z

imankulov
Sep 14, 2023

Exploring the issue 530 and the proposed solution in 538, as well as considering possible alternatives, I would like to discuss the current architecture of django-unicorn.

I still have some gaps in understanding how django-unicorn works under the hood, so some of my assumptions may be wrong

Where is the source of truth for component state?

As we discovered in issue 530, the root cause of the issue is that the server forgets to inform the client about newly rendered components, as it does during the initial load. The workaround provided in PR 538 involves parsing the returned HTML to find missing components. However, the important piece of the puzzle, which is the component state for child components, is missing, and the workaround does not initialize their state.

This made me think about the role of the state on the client and how it interacts with the server.

The server loads the initial state, and then the client is responsible for maintaining it. After the first load, the server acts statelessly. It receives commands and state updates from the client and modifies them. As a side effect, the server also re-renders the HTML.

So, the algorithm for updating something on the page is as follows:

Client provides the state, and actions that need to be performed to modify it.
Server performs actions to update the state.
Server re-renders the template to reflect the state changes.
Server sends back to the client the new state and the new piece of HTML.
Client updates its internal state, and updates the HTML on the page.

Server has the authority:

To provide initial state values.
To change the state in response to actions.
To represent the state as HTML (rendering state).

The client maintains the component state.

What are the flaws?

This model would work well if we considered the component model as a flat list of independent components. However, problems arise when the server needs to re-render nested components. Additionally, any component can use self.parent to update the state of the parent component.

The client does not send the state of child components. Without this information, what should the server use to render them? It appears that the server uses cached values or, if they are not available, uses initial values (I am not sure about this). There is no guarantee that the server accurately reflects the client's state, so the clients can be re-rendered differently. How can we ensure that the parent component does not override the child component's state? LiveWire addresses this by not rendering the HTML for child components (leaving slots for them), and the client-side rendering skips merging these components into the tree. See Every component is an island.

Rendering parents poses similar challenges. The client sends the parent state, but that's it. If that parent decides to update the state of its own parent (the grandfather of the initial component), the server ends up in a situation where it has no source of truth for the up-to-date state of the grandparent (also, not sure if it works exactly as I described).

As we can see, the premise "the client has authority over the state" is not always true, and sometimes the server resorts to getting the state from the cache (as when the child component needs to be re-rendered). In another case, such as when new components are added after the first load, the server forgets to communicate the state to the client, and the client does not initialize the component.

All of this results in a situation where we actually have two storages for the component state, and the server uses either of them depending on which data is available. It's like having a distributed database where each replica can act as the source of truth. Maintaining the distributed database state is a notoriously challenging problem. In our case, it may not be that big of a challenge as long as requests to change the state on the server are serialized, and responses contain an accurate representation of the changed state. However, the question arises: do we need two authorities for component states?

Can we drop the client-side state?

We already maintain the state on the server side, and it is crucial for the correct functioning of the server. In this case, can we always use the server-side state as the source of truth? Additionally, can we stop sharing the state with the client side altogether?

If we implement it like this, what are the drawbacks of this approach?

adamghill · 2023-09-18T15:53:20Z

adamghill
Sep 18, 2023
Maintainer

I think you have accurately described the situation. I think initially I tried to keep the server as stateless as possible -- store the component state client-side. But, I ran into some issues developing nested components and now we are in this weird hybrid state where it's mostly stateless except I use the Django cache as a crutch.

It appears that the server uses cached values or, if they are not available, uses initial values (I am not sure about this)

Yep, this is correct.

do we need two authorities for component states?

I think the answer is hopefully "no". It makes the system needlessly complicated and reasoning about what the correct state is a real challenge.

Can we drop the client-side state?

Theoretically, a component could re-render client-side without needing the server at all. To be fair, I don't think I actually do this anywhere in Unicorn right now. The only other drawback is that cache or a database would be a hard requirement. Right now, the simple use cases of Unicorn with a simple deployment doesn't need a cache and the client-side can handle all of it. But, maybe that's really an illusion -- any sufficiently complicated setup will require a cache anyway. It might be better to just say it's a requirement from the outset? The other potential is that a known shared server cache could allow components to stay in-sync across browser tabs/windows (this would require websockets and probably is a far in the future, though)?

From the other side, can we drop server-side state? If everything was client-side then the server is stateless which has a certain appeal. Maybe it's not worth the hassle to implement, but it would mean that cache wouldn't be a requirement at all.

0 replies

hendi · 2023-09-18T17:40:24Z

hendi
Sep 18, 2023

I'm strongly in favour of server-side state:

Security The current "let's send all data to the client" is a security nightmare reminding me of Rails' early "opt-in security" which has been disastrous.
Less Data If the client is the source of truth, it needs to know all the data necessary to calculate the displayed HTML. E.g. filtering a huge list down to a single result would still send all the list's data to the client
Improved wire format Similar to Elixir's Liveview, server-side state would allow us to minimize the transmitted data (in the future). Instead of sending fully rendered HTML down to the client, and have it do the diffing, the server could send small "change instructions" instead

I also don't think that client side state has higher ease of use, just because the absence of Redis etc. With the current architecture, you can't simply pass e.g. User.objects.all() to a component since that would leak the hashed passwords; instead you have to provide some custom (de)serialization logic. So the current architecture is only "easy" if one is okay with security issues (by default).

I'd like to propose to make the state server side, but in the future exploring making client-side state opt-in: enable parts of the context to live client-side, which could allow for some interactivity without any trips to the server (a friend of mine is experimenting with a python2js compiler which is showing promising results for certain things)

1 reply

adamghill Sep 18, 2023
Maintainer

I tend to agree, but I'm trying to think from both sides to make sure we aren't missing anything.

Personally, I don't think a requirement for persistent cache is too onerous. Another option would be a simple database model with component_id and a JSON field for data. But, if it's just a key/value then cache would just be easier...

imankulov · 2023-09-19T13:13:43Z

imankulov
Sep 19, 2023
Author

Thank you for the response and for clarifying my doubts. Below, I am sharing my thoughts on how server-side state management can be implemented.

Server-side vs. client-side session

I don't have much to add to the server-side vs. client-side storage discussion, other than I am leaning towards keeping the state on the server as it "feels easier". You have the state where you execute commands, and you don't worry about out-of-date or missing data.

Server-side store implementation

Below, I am sharing some ideas on where the django-unicorn state can be stored, assuming that we choose to store it on the server.

I do not take into account the current implementation because I am not that familiar with it. Consider my thoughts as if I would be asked to create it from scratch.

State management with Django cache

I assume that the Django cache can be used as long as the documentation makes it clear that you need to use a persistent cache and not local memory or a dummy implementation. Django supports database caching, which can be an option for installations that don't want to depend on external storage.

The flip side of the database cache is that, apparently, django-unicorn will generate tons of keys: essentially, every page reload will generate as many new keys as there are components on the page, and the database cache doesn't evict any keys, as Memcached or Redis does.

Another drawback of the cache is that it's often treated as a non-persistent store that is OK with randomly evicting keys due to memory pressure. Since in our scenario it acts as the source of truth, it should be treated differently. Not as a cache, but rather as a key-value store.

Ref: https://docs.djangoproject.com/en/dev/topics/cache/#setting-up-the-cache

State management with sessions

The advantage of sessions over storing the session state in the cache is that it's naturally scoped by, well, the session. There is less risk of leaking information to other requests due to collisions or bugs in the code. Besides, cookie-based sessions naturally support storage-less state.

However, I believe session-based state management is impractical:

Every request requires serializing, deserializing, and storing in memory the state for all the components across all the pages.
Cookie-based sessions are inherently limited in size and can quickly get out of control.
As Hendi mentioned, implicit serialization and deserialization of models pose a security risk.

State management with pluggable stores

We can take an approach similar to django-storages or to Django sessions themselves and define our own interface and a few implementations for it. We can still use Django cache under the hood, but since it becomes an implementation detail, we're not tied to it. If we find ourselves needing to extend the interface with new operations, we can do so.

At first glance, the interface for the state management could look like this:

class IStateStorage(abc.ABC):

    @abc.abstractmethod
    def cache_full_tree(self, component: UnicornView) -> None:
        ...

    @abc.abstractmethod
    def restore_from_cache(self, component_cache_key: str) -> UnicornView:
        ...

If I were to implement server-side state management, I would go with the pluggable store approach, backed by Django cache as an implementation.

0 replies

imankulov · 2023-09-20T10:51:50Z

imankulov
Sep 20, 2023
Author

Hey @adamghill, just wanted to check in with you. If I were to start moving the context to the server-side, would you be up for helping me with it and eventually merging it with the main branch? Basically, I'm wondering if you're good with us going in that direction or if you're still unsure.

If you're on board with this direction, do you have any specific guidance or feedback before I dive in?

Just want to make sure there aren't any more questions to address before I start tinkering with the code.

2 replies

adamghill Sep 20, 2023
Maintainer

start moving the context to the server-side, would you be up for helping me with it and eventually merging it with the main branch

Yes! I think your interface for state management and starting with cache implementation makes complete sense.

Just want to make sure there aren't any more questions

Nope! I do want to bring up something you mentioned earlier above, though:

as long as requests to change the state on the server are serialized

I have never been able to replicate this reliably, but I think out of order requests can cause a problem for Unicorn now. So, I'm not sure this would be a new problem, but it's something to be aware of. I spent a lot of time making serialization work at some point, but it 1) requires a cache (not a problem in this new hypothetical world), 2) makes the server code way more complicated, and 3) is not enabled by default so is less battle tested. But, maybe part of this server-side context work could benefit from changing the default to use serialization?

There are a few other issues/PRs/discussions floating around and it probably makes sense to determine if they are still valid going in a new direction -- I guess I'm mostly thinking of #600 here to be honest. My plan is to get your Alpine morph PR mergeable first and then figure out what the next priority is.

imankulov Sep 20, 2023
Author

Thanks for the green light, @adamghill. And thank you for pointing out the potential issue with race conditions during state change execution. Although I am unsure about the exact solution, I will keep this in mind. I will keep you posted on the progress. Hopefully, I have something to share soon.

My plan is to get your Alpine morph PR mergeable first and then figure out what the next priority is.

Sounds good, thanks!

imankulov · 2023-10-02T16:00:50Z

imankulov
Oct 2, 2023
Author

Meanwhile, while exploring the issue, I came across another problem. I am not sure if it makes sense to create another issue for it, however.

The gist of the problem is that when we render an updated component that has a parent component, we render the updated component first (which works), and then we re-render the parent component. However, the parent component doesn't take the child component's state into account and renders it back with default values.

I experimented in the branch example/wizard2.

https://www.loom.com/share/19031dd5121b48689ce6e028b9b4e295

2 replies

adamghill Oct 4, 2023
Maintainer

Thanks for the video above. So, I added this code very recently because I thought that deleting the dom key would be a way to reduce the payload sent in the response, but I'm pretty sure it creates this subtle bug. One quick solution is to remove that del call and I bet things will work as expected. I can try this and release a patch version tomorrow evening.

But, these are all good examples for testing out server-side component state.

adamghill Oct 4, 2023
Maintainer

So, I just grabbed your branch and tried it out. I think I might have been wrong with the comment about the del call above.

When I removed the id="step1" and id="step2" kwargs from

{% if step == 1 %}
    {% unicorn "wizard/step1" id="step1" %}
  {% elif step == 2 %}
    {% unicorn "wizard/step2" id="step2" %}
  {% else %}
    UNKNOWN STEP {{ step }}
  {% endif %}

this is what I see: https://github.com/adamghill/django-unicorn/assets/317045/32526ce9-76f4-4225-a1d6-e192663813e6.

That seems like what you would expect? I'm going to keep looking to see why the id kwarg might be breaking things.

longhotsummer · 2023-11-04T08:34:43Z

longhotsummer
Nov 4, 2023

I've just started playing with Django Unicorn (enjoying it so far) and found this thread when nested components weren't updating as I expected. It does seem that knowing how and when to update nested components is a difficult problem in general, since even Vue and React have specific design guidelines to make it easier. For example in Vue, children can't change props passed in from parents but should use events to do so instead.

I don't know enough about Unicorn (or LiveWire etc.) to contribute concrete proposals, but I do have two thoughts, hopefully they are useful.

Client vs server state

It may be helpful to differentiate between which parts of the system are authoritative for what state. Let's say we have a todo list app. If I filter the list, the client sends the filter state to the server, the server filters the todos from the database (possibly matching new ones added by someone else since the last update), and sends back the HTML.

As I see it, the client is authoritative for the state of the filter input box, but not the actual items that match the filter. The way I implement this is that the filtered todo objects are passed as render context to the templates, they're not attached as state to the component because the client is not authoritative for them, the database is. Other users can update the database and my client should reflect (not overwrite) those changes on the next update.

My approach to Unicorn and state is that it's a glorified version of storing my state in the URL query string. When it changes, instead of redirecting the user to a new URL with the updated todo filter, the same (logical) state is sent to the server with Unicorn and the HTML is updated. Therefore, I only store state that the client is authoritative for, not the server: no todo objects are serialised as state, only as HTML. This implies that the client cannot completely re-render the page without the help from the server, because not all state is available. That's the trade-off that Unicorn, LiveWire etc. make: the server always does rendering because only it has everything needed (the client's state from the client, and the server state from the server).

Given this, keeping client-side state on the server seems odd: why should the server be authoritative for the value of my input box? What if the client state and the server state disagree (eg. a todo has been deleted on the server but not the client)? Caches throw data away sometimes by design -- what if that state is lost? If per-client state is persisted, then how do we know to clean it up? Isn't that why cookie-based sessions are preferred over databased-backed sessions (the dreaded "jsessionid" URL param from the past springs to mind!)?

Learning from other libraries

My second thought is, what can be learned from other similar libraries, such as LiveWire? Would following their examples (and lessons learnt) save Unicorn from having to learn the same lessons?

LiveWire seems to get solve this with their explicit "all components are an island" approach, which means they don't need to store global state on the server, and don't send ALL client state back to the server when updating a component. Of course the trade-off is that other components don't know to update. They have opt-in workarounds like events to allow for the inefficient, but convenient, alternatives.

TLDR

You can't have your cake and eat it too: either components are islands, or all state is shared and everything re-rendered; both have pros and cons.
My gut and other web frameworks seem to say that storing all component state on the server is a bad idea.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Component state in django-unicorn #599

{{title}}

Replies: 6 comments 5 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Component state in django-unicorn #599

imankulov Sep 14, 2023

Where is the source of truth for component state?

What are the flaws?

Can we drop the client-side state?

Replies: 6 comments · 5 replies

adamghill Sep 18, 2023 Maintainer

hendi Sep 18, 2023

adamghill Sep 18, 2023 Maintainer

imankulov Sep 19, 2023 Author

Server-side vs. client-side session

Server-side store implementation

imankulov Sep 20, 2023 Author

adamghill Sep 20, 2023 Maintainer

imankulov Sep 20, 2023 Author

imankulov Oct 2, 2023 Author

adamghill Oct 4, 2023 Maintainer

adamghill Oct 4, 2023 Maintainer

longhotsummer Nov 4, 2023

Client vs server state

Learning from other libraries

TLDR

imankulov
Sep 14, 2023

Replies: 6 comments 5 replies

adamghill
Sep 18, 2023
Maintainer

hendi
Sep 18, 2023

adamghill Sep 18, 2023
Maintainer

imankulov
Sep 19, 2023
Author

imankulov
Sep 20, 2023
Author

adamghill Sep 20, 2023
Maintainer

imankulov Sep 20, 2023
Author

imankulov
Oct 2, 2023
Author

adamghill Oct 4, 2023
Maintainer

adamghill Oct 4, 2023
Maintainer

longhotsummer
Nov 4, 2023