-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Suggestion] Thread safety #124
Comments
Beware, Long Read. I don’t know how useful this will be, but maybe it will give you some ideas. It seems to me that command queues for accessing shared resources are best suited here. For example, to access the file system we can write such interface: internal interface IFileAccessor
{
void Read(Int32 offset, Memory<Byte> buffer);
} And manager to store list of accessors: internal sealed class FileManager
{
private readonly ConcurrentDictionary<String, IFileAccessor> _files = new(StringComparer.InvariantCultureIgnoreCase);
public IFileAccessor Access(String filePath)
{
return _files.GetOrAdd(Path.GetFullPath(filePath), path => new FileAccessQueue(path));
}
} When we need to read a part of file we can do it via IFileAccessor: Byte[] Read(String assetPath, Int32 offset, Int32 size)
{
Byte[] buff = new Byte[size];
IFileAccessor accessor = this.FileManager.Access(assetPath);
accessor.Read(offset, buff); // blocking magic here
return buff;
} And very simple implementation of commands queue: internal class FileAccessQueue : IFileAccessor
{
private readonly FileStream _fileStream;
public FileAccessQueue(string path)
{
_fileStream = File.OpenRead(path);
}
public void Read(Int64 offset, Memory<Byte> buffer)
{
lock (_fileStream) // we can replace it with ConcurrentQueue + background thread that will process it to avoid locks
{
if (_fileStream.Position != offset)
_fileStream.Seek(offset, SeekOrigin.Begin);
_fileStream.Read(buffer.Span);
}
}
} The main idea is to isolate data access that requires synchronization from business logic, minimizing competition for the same data between threads. And avoid passing any callbacks to such methods, which will increase the lock acquisition time. In some cases, like type resolving, makes sense to immediately delegate this activity to one single thread; this ensures that different threads that process different Asset Files do not start analyzing the same types from different ends and do not get into a deadlock: internal sealed class TypeResolver
{
private readonly ConcurrentDictionary<Object, Object> _resolved = new();
private readonly ConcurrentQueue<ResolveTypeCommand> _queue = new();
private readonly AutoResetEvent _queued = new(initialState: false);
private readonly Thread _dispatchingThread;
public TypeResolver()
{
_dispatchingThread = new Thread(DispatchQueue) { IsBackground = true };
}
private void DispatchQueue()
{
while (true)
{
_queued.WaitOne();
while (_queue.TryDequeue(out ResolveTypeCommand? command))
command.ResolveNoThrow();
}
}
public Object ResolveTypeInSingleThread(Object input)
{
if (_resolved.TryGetValue(input, out Object? typeDescriptor))
return typeDescriptor;
ResolveTypeCommand command = new(input);
_queue.Enqueue(command);
_queued.Set();
command.Completed.WaitOne();
if (command.Error is null)
return command.Result;
ExceptionDispatchInfo.Throw(command.Error); // to keep original stack
throw command.Error;
}
private sealed class ResolveTypeCommand
{
private readonly Object _input;
public ResolveTypeCommand(Object input)
{
_input = input;
}
public void ResolveNoThrow()
{
try
{
// do something
Result = ...;
}
catch (Exception ex)
{
Error = ex;
}
finally
{
Completed.Set();
}
}
public ManualResetEvent Completed { get; } = new ManualResetEvent(initialState: false);
public Object Result { get; private set; }
public Exception Error { get; private set; }
}
} This way, the only owner of the lock on resolving types can be this background thread, this ensures that we do not encounter deadlocks. At the same time, this thread can read data from files, since IFileAccessor does not contain callbacks, and the blocking time is extremely short. And in the final stage, with asynchronous code and a bunch of wrappers, it might look something like this: public async Task ReadObjectAsync(AssetFile assetFile, ObjectInfo objectInfo, CancellationToken ct)
{
var fileAccessor = FileManager.Access(assetFile);
var header = await fileAccessor.ReadHeaderAsync(objectInfo, ct);
var type = await TypeResolver.ResolveAsync(header.Type, ct);
foreach (var field in type.Fields)
{
///
}
} |
After reading this over a bit, I think I now understand your suggestion. This sounds somewhat similar to what I was kind of suggesting but with async rather than just locking lists and dictionaries and stuff. I think the thing that takes the biggest chunk of time (ignoring disk speed which obviously we can't fix) is asset deserialization. The deserialization code reads individual ints and floats and strings and whatnot from the stream which is very big bad. My first suggestion was that instead of having the deserialization also read the file at the same time, we should just read the entire asset into memory and then we would have the full asset in memory that can be deserialized at the same time as other assets are being deserialized without conflicting. We could do the same with reading the metadata (file listing, typetree, etc.) for AssetsFiles, but usually these read pretty fast, so I don't know if the header needs it. Besides dll loading for the mono.cecil temp generator (which could just be locked), I think generating AssetTypeTemplateFields is quick and thread safe already. The tpk doesn't change once it's loaded and neither does the type tree. Arguably the worst part will be the caching parts which I think will just have to be a lot of locking (unless you know of a better way.) I think the async version of AssetsManager could be a good option, but I still think we can make the original AssetsManager thread safe as well. But, IANAM (I am not a multithreading expert) |
If you are talking about loading only a piece inside the asset that we are currently reading, then this does not really matter IF you read the file sequentially and do not use stream.Seek. When reading ints, FileStream reads not 4 bytes, but a buffer (4096 bytes by default), so sequential reading of a file fragment by parts of 4 bytes is fast as reading by 4096 bytes. But every call to stream.Seek reset this buffer, so you should avoid calls to stream.Seek and try to read sequentially. |
So in this case I'm talking about each asset, not an .assets file (I'll just call an .assets file a serialized file from now on to prevent confusion.) "Loading" a serialized file in at.net is loading its header and metadata (including file table) but not loading any assets. Assets themselves are usually pretty small, usually around the hundreds of bytes. Although yes, sometimes we have a 50mb shader or mesh occasionally so we might have to do partial asset loading in those cases. Loading a serialized file could take a significant amount of time for large files, but I'd need to add an option to skip the serialized file's
This isn't possible since we can load two assets at one time and both would require us to jump to different start locations. |
Currently classes such as AssetsManager and AssetsFileInstance are not thread safe.
The most interesting methods are
LoadAssetsFile
,GetBaseField
andGetExtAsset
.Due to non-thread-safe caches inside, this does not allow parallelization of work with a large number of asset files on fast SSD.
The thread-safe wrapper over AssetManager provides no benefit due to blocking IO operations inside.
Note: if anyone wants to implement this suggestion, it is probably best to write a thread-safe
AsyncAssetManager
from scratch, since it is very easy to get things wrong due to the large number of collections and non-thread-safe operations inside theAssetManager
.The text was updated successfully, but these errors were encountered: