1. Make handling of user tokens atomic:
Loads started with the external-facing API used to perform a two-step setup of the user token. Between both, the mutex was unlocked without its reference count having been increased. A non-user-initiated load could therefore destroy the load task when it unreferenced the token.
Those stages now happen atomically so in the one hand, the described race condition can't happen so the load task life insurance doesn't have a gap anymore and, on the other hand, the ugliness that the call to load could return `ERR_BUSY` if happening while other thread was between both steps is gone.
The code has been refactored so the user token concerns are still outside the inner load start function, which is agnostic to that for a cleaner implementation.
2. Clear ambiguity between load operations running on `WorkerThreadPool`:
The two cases are: single-loaded thread directly started by a user pool task and a load started by the system as part of a multi-threaded load.
Since ensuring all the code dealing with this distinction would make it very complex, and error-prone, a different measure is applied instead: just take one of the cases out of the dicotomy. We now ensure every load happening on a pool thread has been initiated by the system.
The way of achieving that is that a single-threaded user-started load initiated from a pool thread, is run as another task.
Benefits:
- Simpler code. The main load function is renamed so it's apparent that it's not just a thread entry point anymore.
- Cache and thread modes of the original task are honored. A beautiful consequence of this is that, unlike formerly, re-issued loads can use the resource cache, which makes this mechanism much more performant.
- The newly added getter for caller task id in WorkerThreadPool allows to remove the custom tracking of that in ResourceLoader.
- The check to replace a cached resource and the replacement itself happen atomically. That fixes deadlock prevention leading to multiple resource instances of the same one on disk. As a side effect, it also makes the regular check for replace load mode more robust.
This is about not letting the resource format loader set the error code directly on the task anymore. Instead, it's stored locally and assigned only when it is right to do so.
Otherwise, other tasks may see an error code in the current one before it's state having transitioned to errored. While this, besides the technically true data race, may not be a problem in practice, it causes surprising situations during debugging as it breaks assumptions.
ResourceLoader:
- Fix invalid tokens being returned.
- Remove no longer written `ThreadLoadTask::dependent_path` and the code reading from it.
- Clear deadlock hazard by keeping the mutex unlocked during userland polling.
WorkerThreadPool:
- Include thread call queue override in the thread state reset set, which allows to simplify the code that handled that (imperfectly) in the ResourceLoader.
- Handle the mutex type correctly on entering an allowance zone.
CommandQueueMT:
- Handle the additional possibility of command buffer reallocation that mutex unlock allowance introduces.
- Allows the message queue override to flush after loading each resource, which was the original intent.
- Removes a redundant call to mark the thread as safe-for-nodes.
- `CACHE_MODE_IGNORE_DEEP` is checked in addition to `CACHE_MODE_IGNORE` to determine if a load is uncached. This avoids crashes in uncached loads due to prematurely freed load tasks.
- Cached load tasks are isolated (not registered in the task map ever). This avoids regular loads from reusing in-flight cached loads, which is not correct.
- Unify documentation, hoping to clear misconcepctions about about propagation of the cache mode across dependant loads.
- Clarify in docs that `CACHE_MODE_REPLACE` now also works on the main resource (from #87008).
- Add two recursive modes, counterparts of `CACHE_MODE_REPLACE` and `CACHE_MODE_IGNORE`, since it seems some need them (see #59669, #82830).
- Let resources, even loaded with one of the ignore-cache modes, get a path, which is useful for tools.
This change simply extracts 'SafeBinaryMutex' from 'mutex.h' to
'safe_binary_mutex.h', in an effort to reduce the compilation
speed impact of including `mutex.h`.
This commits rewrites the sync logic in a way that the
`use_system_threads_for_low_priority_tasks` setting, which was added due to
the lack of a cross-platform wait-for-multiple-objects functionality, can be
removed (it's as if it was effectively hardcoded to `false`).
With the new implementation, we have the best of both worlds: threads don't
have to poll, plus no bespoke threads are used.
In addition, regarding deadlock prevention, since not every possible case of
wait-deadlock could be avoided, this commits removes the current best-effort
avoidance mechanisms and keeps only a simple, pessimistic way of detection.
It turns out that the only current user of deadlock prevention, ResourceLoader,
works fine with it and so every possible situation in resource loading is now
properly handled, with no possibilities of deadlocking. There's a comment in
the code with further details.
Lastly, a potential for load tasks never being awaited/disposed is cleared.