Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NetworkVariable dictionary write permission and duplicate key issues #3070

Closed
ezoray opened this issue Sep 18, 2024 · 17 comments
Closed

NetworkVariable dictionary write permission and duplicate key issues #3070

ezoray opened this issue Sep 18, 2024 · 17 comments
Assignees
Labels
priority:high stat:imported Status - Issue is tracked internally at Unity type:bug Bug Report

Comments

@ezoray
Copy link

ezoray commented Sep 18, 2024

Description

  1. The network variable dictionary ignores Owner/Server write permission allowing the non-owner/client to write to the dictionary (possibly its own copy but I'm not sure on that).
  2. If a client connects while entries are being added to the dictionary a duplicate key error is reported in the client. Something similar happens on change of ownership.

Reproduce Steps

The first issue can be replicated running the code here.
To see the second issue run the code in the attached unitypackage.

Actual Outcome

  1. Client can write to the dictionary when it doesn't have permission.
  2. On connection the client reports the error: ArgumentException: An item with the same key has already been added. Key: n

Environment

@ezoray ezoray added stat:awaiting triage Status - Awaiting triage from the Netcode team. type:bug Bug Report labels Sep 18, 2024
@NoelStephensUnity NoelStephensUnity added Investigating Issue is currently being investigated priority:high stat:import Status - Issue is going to be saved internally and removed stat:awaiting triage Status - Awaiting triage from the Netcode team. Investigating Issue is currently being investigated labels Sep 27, 2024
@NoelStephensUnity
Copy link
Collaborator

NoelStephensUnity commented Sep 27, 2024

@ezoray
Ok, this was a very good catch and a bit tricky to prevent. I went ahead and pulled down your repo to use your examples (nice btw! 👍 ) and came up with a solution for this that prevents local modification of a collection with still a few minor caveats.

The underlying issue is that default .NET collections have no way to be notified when the collection is modified and since we have already committed to supporting these types it is a bit of a catch 22 scenario:

  • Collection properties are really references to the collection itself
    • Due to this, simple assignment only creates a reference/pointer to the same collection and as such full duplication of the collection is required in order to have two unique collections for comparison purposes.
  • Since we don't want to poll every NetworkVariable property of every NetworkBehaviour component (very bad for performance reasons) and since only the client/server with write permissions can mark something dirty that then adds the NetworkObject to a post late update queue that is then polled for changes to NetworkVariable properties (i.e. clients without write permissions will never have this check), I had to come up with a way to provide developers a mechanism to check for changes on non-authority instances.

The solution (thus far) is to keep one more internal duplicate of the NetworkVariable that is a duplicate of the current value and can be used to check against the current value when:

  • Handling reads (deltas or field), if the m_InternalValue is not equal to the m_InternalOriginalValue then prior to reading the update on non-authority instances the m_InternalOriginalValue is applied first, then the change is applied, then a duplication of the m_InternalValue is applied to the m_InternalOriginalValue.
    • This handles any edge case scenarios where a delta could be processed but the collection on the non-authority side could still have been modified.
  • NetworkVariable.CheckDirtyState will first determine if the client has write permissions and if not it does the same comparison between the m_InternalValue and m_InternalOriginalValue and if there is a delta it applies the m_InternalOriginalValue to the m_InternalValue (i.e. reverts back to the last known current value).
  • NetworkVariable.IsDirty does the same kind of check as NetworkVariable.CheckDirtyState and if the two don't match on a client without write permissions the value is reverted back to the last known current value (which is not the same as the previous value but just a duplicate of the current value).

Due to the underlying issue of standard .NET collections and lack of any notification (without using a modified/alternate version...since we are supporting the standard .NET collections), the best way to handle the scenario where a client could modify a collection without invoking the NetworkVariable.CheckDirtyState was to provide the alternate option of invoking NetworkVariable.IsDirty which reverts any local changes on clients without write permissions and returns true (without actually setting the dirty state) so developers can have some way of detecting if a collection was modified on a client without write permissions. This is only in the event you might want to take some other action and/or generate some other event (if trying to detect cheating or the like), but doesn't require reverting back to the last known current state since it automatically does this for you.

I need to mull over these modifications a bit more and extend the NetworkVariable collections integration tests to validate the modifications... but should have a PR up for this by end of day. Once that is done, I will post the modified version of your project here that will also point to the PR branch so you can test it yourself (in the event you want to do a second pass to make sure it works as you would expect and to see how I was testing using your project).

Also wanted to thank you very much for your efforts in helping us catch these kinds of things and for your awesome project that hopefully others will be able to learn from... super super awesome stuff you have there!
:godmode: 👍 💯

@ezoray
Copy link
Author

ezoray commented Sep 27, 2024

Thanks Noel for the explanation, I think I'm getting it. It does seem that having NetworkVariables handling collections is pretty awkward and outside their original remit. I must admit I'm not really liking having the receiver have to do a comparison of the collections to find out what's changed, but as it's the only option I guess I'll have to get used to it :) Please post the modified code when you're done and I'll be happy to test it.

Thanks for your kind words on the project. I hope it's useful to others, it did motivate me to bring some order to a lot of test code scattered across various projects. I've also learned and am still learning things I missed or wouldn't have come across otherwise, so win win. :)

@NoelStephensUnity
Copy link
Collaborator

NoelStephensUnity commented Sep 27, 2024

@ezoray
Ok, I am attaching the modified project that starts off with the NetworkVariableDictionaryScene scene. I sort of moved a few things around (I prefer to make stand alone builds as a final pass and added my NetworkManagerHelper that has the connect/disconnect simple UI with the generic/default runtime console logging) and added a CheckDirty toggle that only displays on the non-owner side and when disabled prevents NetworkVariable.CheckDirtyState from being invoked within InScene.DictionaryUpdate and uses the NetworkVariable.IsDirty method instead. The manifest points to the PR's branch.

NGOTOGO_CollectionsFix.zip

Let me know if this seems like it works more like you would expect?

Cheers!
😸

@NoelStephensUnity NoelStephensUnity self-assigned this Sep 28, 2024
@ezoray
Copy link
Author

ezoray commented Sep 29, 2024

I had a look and there is an issue, I think down to the first client interaction with the dictionary after switching ownership to the client. There's different ways to produce it, the dictionary just needs to have multiple entries.

One way, as server add a couple of entries to the dictionary:
Dictionary - key: 0 value: 10
Dictionary - key: 1 value: 10

Switch ownership to client, then add another entry to the dictionary:
Clients dictionary, count 3:
Dictionary - key: 0 value: 10
Dictionary - key: 1 value: 10
Dictionary - key: 2 value: 10

Server throws exception ArgumentException: An item with the same key has already been added. Key: 1
I think the key in the exception is the last entry that the server added before switching ownership.
Server dictionary count remains 2.

It may be related to the first call of CheckDirtyState on the client after ownership changes to the client. I noticed if no change has been made to the dictionary and CheckDirtyState is called it triggers OnValueChanged with the previous dictionary being empty.

I should add this looks to be a separate issue and the permission changes seem to work fine. :)

@NoelStephensUnity
Copy link
Collaborator

NoelStephensUnity commented Sep 29, 2024

@ezoray Very good catch!
Using the same modified project provided, delete the packages-lock.json file to get the most recent update. The replication is specific to the scenario where the client is already connected, then the entries are added, then ownership transfers to the client, and finally the new client owner adds an entry. Under this scenario, the client does not update its previous value (as you pointed out) which upon the client making a change it sends the last/previous entry added and the newly added entry causing the key already exists exception.
The most recent push to the PR should resolve this issue. It basically assure that any owner write permission NetworkVariable has its previous and original value updated to mirror the current value.
👍

@ezoray
Copy link
Author

ezoray commented Sep 29, 2024

Looks fixed to me. 👍

The only other issue is one I mentioned previously. While entries are being added constantly to the dictionary by the server, when the client connects they get the contents on spawn but also a duplicate of the last entry resulting in an exception. This was happening on ownership change as well but I haven't checked this with the modified project code.

@NoelStephensUnity
Copy link
Collaborator

NoelStephensUnity commented Sep 29, 2024

While entries are being added constantly to the dictionary by the server, when the client connects they get the contents on spawn but also a duplicate of the last entry resulting in an exception. This was happening on ownership change as well but I haven't checked this with the modified project code.

Yeah, this is indeed a different issue and I have some changes in my local branch for this PR that resolve the dictionary issue. It is looking like it could be an order of operations issue...and my current fix is to handle it deep within the dictionary collection serialization... but that sort of masks the core issue where you could receive an update for a value that is already set... (thinking of lists primarily where you could potentially get duplicate entries since that is valid).

Going to look at this more tomorrow and if it looks like I need to spend a bit more time on it then I will split that one specific issue off to a separate internal ticket and make a new PR for that fix.

@NoelStephensUnity
Copy link
Collaborator

NoelStephensUnity commented Sep 30, 2024

@ezoray
Ok, I narrowed down the root cause of the issue. When initially synchronizing a client it was sending the current value of each NetworkVariable even if there were pending changes. This would then cause the key already exists exception on the newly connected client side when the host would send the pending changes on the next tick.

The proper fix (high level overview) was to check if there were pending changes (per NetworkVariable) when writing the full synchronization information to the client and if there were then writing the previously known value as opposed to the current value would synchronize the client to the same state as the rest of the connected clients. Some short time later (default would be on the next tick) the changes are sent to the newly connected client and applied (at the same time any other connected clients would be updated to the changes).

This adjustment should resolve the 2nd issue for any collection type supported. So, I reverted out the dictionary collection serialization (which is sort of the "canary in the bird cage" since it throws the key already exists exception if there is some form of duplicated sync issue). For HashSets and Lists, they are silent since a HashSet won't add duplicate item and a List will just add a duplicated entry without any exceptions or the like.

Will write an integration test for this update to the existing PR. Your included custom package I imported into the modified version of the NGOToGo project and it works just fine.
👍

@NoelStephensUnity
Copy link
Collaborator

NoelStephensUnity commented Sep 30, 2024

@ezoray
Ok.. I actually ran into several issues along the way and ended up with a handful of fixes.
Here is the updated version of the modified NGOToGo project with the 2nd issue included as well as a bootstrap menu so you can load both issues in one pass without having to change the secenes in build index order and the like:
NGOTOGO_CollectionsFix_II.zip

One of the additions is a notification that will be generated if you set the NetworkManager logging mode to developer and you change ownership of the same NetworkObject under a 6x the tick rate frequency period of time. It still allows the change but I decided that a warning vs a clamp was a better approach... the general idea is that due to the nature of how NetworkVariables are updated and the time it can take to handle processing the change in ownership (client-server network topology) there is a scenario where you can send the message from the server side prior to the client having already processed an in-flight change in ownership message...and you are changing the NetworkVariable (collections) when ownership is gained.

So, the rule of thumb here is don't repeatedly change ownership of the same NetworkObject within a really short period of time if you know there are collections that need to be synchronized properly and you are changing network variable state upon the gain of ownership (an edge case for sure, but it is something that can cause issues especially using a client-server network topology).

So...on issue-2 if you rapidly change ownership with higher latencies then you "could" have a synchronization issue.
You can start two local stand alone builds of the included updated NGOToGo project and click as fast as you can and shouldn't see any issue... but under more latent conditions you could potentially see an invalid synchronization if you are changing the ownership of the same NetworkObject with NetworkVariables that are collections (primarily List and Dictionary).

However, setting that minor edge case aside... you should see both issues are resolved.

👍 👍

@ezoray
Copy link
Author

ezoray commented Oct 1, 2024

Looks good Noel, the connection issue is gone.

I changed the Update loop condition so the spam is continuous and not restricted to the server:

        private void Update()
        {
            if (IsSpawned && IsOwner)
            {
                netDictionary.Value.Add(keyValue.Value, keyValue.Value);
                keyValue.Value++;
                netDictionary.CheckDirtyState();
            }
        }

It looks like changing ownership to the client is fine but switching it back to the server there's errors on both sides. Going by what you said above is this something that can be handled?

Edit: To clarify after further testing, server spamming can be turned off so only the client spams the dictionary, the following errors occur when switching ownership back to server:

Client: multiple ArgumentException: An item with the same key has already been added. Key: n

Server: [Netcode] Client wrote to NetworkVariable`1 without permission. No more variables can be read. This is critical. => NetworkObjectId: 1 - GetNetworkBehaviourOrderIndex(): 0 - VariableIndex: 0

The errors are broadly similar to when the server also spams the dictionary.

@NoelStephensUnity
Copy link
Collaborator

NoelStephensUnity commented Oct 2, 2024

@ezoray
So with owner write permissions and how you have it setup you will need to add some form of change in ownership synchronization into the mix as you are indeed seeing the issue where you can change ownership with "in-flight" state update messages. Since there is a delay (at least 1/2 RTT) between the host/server sending the change in ownership message and the clients while the host/server applies the change immediately (locally), you will run into synchronization issues with owner write permission based NetworkVariable collections for the 2nd issue use case scenario.

Here is a modified version of your project that uses the same fixes (in this issue's assigned PR) but also incorporates a tick delay between changes in ownership as well as it uses RPCs to synchronize any ownership changes that are client to host or to another client:

NGOTOGO_CollectionsFix_III.zip

It works with multiple clients connected and provides some additional properties on the "InScene" NetworkObject:
image
The ownership tick delay is the amount of time the owner waits before allowing a transfer of ownership.
When it is transitioning from the host to a client it handles this locally. When it is transitioning from a client to the host or another client it uses the RPCs to synchronize the transfer of ownership.
The Update Rate is how often a client/host will update the dictionary in the update loop.
I have tried various settings and it works (locally) with the update rate set to 1 and the tick delay set to the minimum of 3 (my recommendation) ... you could probably shift the minimum down to 1 or 2 but if you introduce normal latency then you could see some issues at that range (32-66ms). My recommendation would be to keep it at least at a 3 tick delay (~100ms) when transferring ownership to assure all state updates have been received and processed (or will be before the change in ownership occurs).

It all really depends upon the use case and what you are trying to accomplish with this kind of use case scenario...

Either case, these modifications to your project should provide you with a reasonable example. I also added some comments to make it easier to digest what all is happening.

Let me know if these adjustments make sense to you and if (with the current fixes in place) it resolves your two issue.

@ezoray
Copy link
Author

ezoray commented Oct 2, 2024

This works a treat. I guess the take away is to be cautious when switching ownership and be aware of potential issues with in-flight messages and code around them if necessary.

Thanks for taking the time to create these examples, they've been very useful and informative. :)

@NoelStephensUnity
Copy link
Collaborator

We will continue to make improvements... but for now this would be the recommended way to handle this.

@ezoray
Copy link
Author

ezoray commented Oct 2, 2024

Understood.

I hope you don't mind but can I bring another issue to your attention: #3059 It's been closed but it may be an issue that needs addressing.

@NoelStephensUnity
Copy link
Collaborator

Understood.

I hope you don't mind but can I bring another issue to your attention: #3059 It's been closed but it may be an issue that needs addressing.

Sam is looking at it and has a fix already locally. I will touch base with him to see where he is on that (you can expect that fix to be included in the same update as this PR).

As a side note:
I just pushed some more fixes that further improves the stability for updating collections and includes a modification to NetworkVariableDeltaMessage that now allows a server to immediately forward delta state updates (to valid clients on a per NetworkVariable field basis) without having to queue them for end of frame (i.e. since messages are processed at the beginning of a frame, this shaves off the state replication time to other clients by close to the total frame time...which depending upon your project could mean 10-16ms faster).

@fluong6 fluong6 added stat:imported Status - Issue is tracked internally at Unity and removed stat:import Status - Issue is going to be saved internally labels Oct 7, 2024
@NoelStephensUnity
Copy link
Collaborator

The fixes in #3081 should be in the up-coming (it is being processed) NGO v2.1.0 update.
👍

@ezoray
Copy link
Author

ezoray commented Oct 18, 2024

Awesome, thanks Noel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority:high stat:imported Status - Issue is tracked internally at Unity type:bug Bug Report
Projects
None yet
Development

No branches or pull requests

3 participants