Features:
- Debug-only tracking of objects by type. See
get_driver_allocs_by_object_type et al.
- Debug-only Breadcrumb info for debugging GPU crashes and device lost
- Performance report per frame from get_perf_report
- Some VMA calls had to be modified in order to insert the necessary
memory callbacks
Functionality marked as "debug-only" is only available in debug or dev
builds.
Misc fixes:
- Early break optimization in RenderingDevice::uniform_set_create
============================
The work was performed by collaboration of TheForge and Google. I am
merely splitting it up into smaller PRs and cleaning it up.
1. Make handling of user tokens atomic:
Loads started with the external-facing API used to perform a two-step setup of the user token. Between both, the mutex was unlocked without its reference count having been increased. A non-user-initiated load could therefore destroy the load task when it unreferenced the token.
Those stages now happen atomically so in the one hand, the described race condition can't happen so the load task life insurance doesn't have a gap anymore and, on the other hand, the ugliness that the call to load could return `ERR_BUSY` if happening while other thread was between both steps is gone.
The code has been refactored so the user token concerns are still outside the inner load start function, which is agnostic to that for a cleaner implementation.
2. Clear ambiguity between load operations running on `WorkerThreadPool`:
The two cases are: single-loaded thread directly started by a user pool task and a load started by the system as part of a multi-threaded load.
Since ensuring all the code dealing with this distinction would make it very complex, and error-prone, a different measure is applied instead: just take one of the cases out of the dicotomy. We now ensure every load happening on a pool thread has been initiated by the system.
The way of achieving that is that a single-threaded user-started load initiated from a pool thread, is run as another task.
Benefits:
- Simpler code. The main load function is renamed so it's apparent that it's not just a thread entry point anymore.
- Cache and thread modes of the original task are honored. A beautiful consequence of this is that, unlike formerly, re-issued loads can use the resource cache, which makes this mechanism much more performant.
- The newly added getter for caller task id in WorkerThreadPool allows to remove the custom tracking of that in ResourceLoader.
- The check to replace a cached resource and the replacement itself happen atomically. That fixes deadlock prevention leading to multiple resource instances of the same one on disk. As a side effect, it also makes the regular check for replace load mode more robust.
Changes to reduce the latency between changing node selection in the editor and seeing the new node reflected in the Inspector tab
- Use Vector instead of List for ThemeOwner::get_theme_type_dependencies and related functions
- Use Vector instead of List for ThemeContext::themes, set_themes(), and get_themes()
- Add ClassDB:get_inheritance_chain_nocheck to get all parent/ancestor classes at once, to avoid repeated ClassDB locking overhead
- Update BIND_THEME_ITEM macros and ThemeDB::update_class_instance_items to use provided StringNames for call to ThemeItemSetter, instead of creating a new StringName in each call
These changes reduce the time taken by EditorInspector::update_tree by around 30-35%
- Returns an empty list when there's not registered plugins, thus preventing the creation of spurious iterator objects
- Inline `Godot#getRotatedValues(...)` given it only had a single caller. This allows to remove the allocation of a float array on each call and replace it with float variables
- Disable sensor events by default. Sensor events can fired at 10-100s Hz taking cpu and memory resources. Now the use of sensor data is behind a project setting allowing projects that have use of it to enable it, while other projects don't pay the cost for a feature they don't use
- Create a pool of specialized input `Runnable` objects to prevent spurious, unbounded `Runnable` allocations
- Disable showing the boot logo for Android XR projects
- Delete locale references of jni strings
It also updates the documentation to describe positive and negative ranges.
Co-Authored-By: Hugo Locurcio <hugo.locurcio@hugo.pro>
Co-Authored-By: kleonc <9283098+kleonc@users.noreply.github.com>
Fixes#81758
DisplayServerWeb::process_joypads handles buttons 6 and 7 of the
HTML5 Standard Gamepad as a special case by doing:
`input->joy_axis(idx, (JoyAxis)b, s_btns[b]);`
This doesn't work because there is no JoyAxis 6 or 7 in the enum
To fix this we use JoyAxis::TRIGGER_LEFT and TRIGGER_RIGHT for button 6
and 7
However since we are now lying to input->joy_axis we also need to lie in
the mappings for the standard gamepad in godotcontrollersdb.txt,
otherwise input->joy_axis will try to find a mapping to axis 4(LT) and
axis 5(RT) that's not defined.
Therefore we set lefttrigger to +a4 and righttrigger to +a5 in the
mapping, to match what we are actually sending.
A cleaner, and more involved fix to this would be modifying
input->joy_button so that it can handle analog buttons and map them to
axes preserving their value instead of converting to boolean
C# uses `long`s to access many native values. With `PtrToArg<m_enum>` and
`PtrToArg<bitfield<m_enum>>` this isn't a problem, as C++ code converts
through a `*(int64_t*)` cast in assignment, so all 64-bits are initialized.
However, with `PtrToArg<char32_t>`, value assignment happens through an
`*(int *)` cast, leaving 32 bits uninitialized where `int` is 32 bits. On
platforms where `int` is 16 bits, there are presumably 48 bits uninitialized,
though there are very few platforms where this is still the case.
The easiest way to see the practical effects of this is by looking at
`EventInputKey.Unicode`:
```csharp
public override void _Input(InputEvent @event) {
if (@event is InputEventKey keyEvent) {
if (keyEvent.IsPressed() && !keyEvent.Echo) {
var raw = keyEvent.Unicode;
var value = raw & 0xffffffff;
GD.Print($"Key pressed: raw: {raw}; masked: {(char) value} ({value})");
}
}
}
```
Pressing 'a' emits the following line:
```
Key pressed: raw: -3617008645356650399; masked: a (97)
```
Examining execution flow in gdb shows this conversion going through the
following line:
```
PtrToArg<char32_t>::encode (p_ptr=0x7ffcd5bb4b18, p_val=97 U'a') at ./core/variant/binder_common.h:221
221 *(int *)p_ptr = p_val;
```
Here, `p_val` is still 97, which is the value `InputEventKey.Unicode`
is expected to have. After assignment, `p *(int64_t *)0x7ffcd5bb4b18` displays
`-3617008645356650399`, with only the lower 32 bits being properly assigned,
and is the value we see from C#.
With this patch applied, the above testing `_Input` now prints:
```
Key pressed: raw: 97; masked: a (97)
```
Thank you to blujay1269 for asking about an unexpected value they saw in
`EventInputKey.Unicode`, which prompted this investigation.
`GDExtension::open_library` has a check in it to see if the library was loaded
from a temp file, and if it was to restore the original name as that is the one
we actually care about. This check is breaking extension reloading on Mac when
the library path is to a framework folder, as the file inside the framework
will not generally be the same name as the folder.
This check also shouldn't be necessary even on Windows, which is the only
platform that uses `generate_temp_files`, since disposal of the created temp
file is handled within `OS_Windows::open_dynamic_library`, and
`GDExtension::open_library` (which is the only function to call
`open_dynamic_library` with a `p_data` argument) only cares about the original
library file path and has to do extra work to remove the name of the temp file.
Instead, I have removed that check and set `OS_Windows::open_dynamic_library`
to return the name of the original file and not the name of the copy.
This fixes GDExtension reloading on macOS. I do not have a Windows machine
available to test that it still works properly on Windows, so someone should
check that before merging this.
Before this change StringName used regular static field
definitions for its mutex, _table, configured and debug_stringname
fields.
Since in the general case the ordering of the static variable and field
initialization and destruction is undefined, it was possible that
the destruction of StringName's static fields happened prior to
the destruction of statically allocated StringName instances.
By changing the static field definitions to inline in string_name.h,
the C++17 standard guarantees the correct initialization and destruction
ordering.
This switches to 64-bit integers in select locations of the Image
class, so that image resolutions of 16384×16384 (used by
lightmap texture arrays) can be used properly. Values that are larger
should also work.
VRAM compression is also supported, although most VRAM-compressed
formats are limited to individual slices of 16384×16384. WebP
is limited to 16383×16383 due to format limitations.
This is about not letting the resource format loader set the error code directly on the task anymore. Instead, it's stored locally and assigned only when it is right to do so.
Otherwise, other tasks may see an error code in the current one before it's state having transitioned to errored. While this, besides the technically true data race, may not be a problem in practice, it causes surprising situations during debugging as it breaks assumptions.
ResourceLoader:
- Fix invalid tokens being returned.
- Remove no longer written `ThreadLoadTask::dependent_path` and the code reading from it.
- Clear deadlock hazard by keeping the mutex unlocked during userland polling.
WorkerThreadPool:
- Include thread call queue override in the thread state reset set, which allows to simplify the code that handled that (imperfectly) in the ResourceLoader.
- Handle the mutex type correctly on entering an allowance zone.
CommandQueueMT:
- Handle the additional possibility of command buffer reallocation that mutex unlock allowance introduces.
Update CowData::insert to call ptrw() before loop, to avoid calling _copy_on_write for each item in the array, as well as repeated index checks in set and get. For larger Vectors/Arrays, this makes inserts around 10x faster for ints, 3x faster for Strings, and 2x faster for Variants. Less of an impact on smaller Vectors/Arrays, as a larger percentage of the time is spent allocating.
These deferring measures were added to aid threaded resource loading in being safe.
They were removed as seemingly unneeded, but it seems they are needed so resources involved in threaded loading interact with others only after "sync points".
Fixes#73374
As of godot 4 On windows/osx the game window will be frozen and will not
be updated.
In the debugger loop it calls
OS::get_singleton()->process_and_drop_events();
which allows windows/osx to handle system events. If the window doesn't
handle these events then both systems will judge the window to be 'not
responding' (osx beachball cursor)
When the event processing code was migrated from OS to DisplayServer the
process_and_drop_events() logic was moved to DisplayServer, but the call
inside the remote debugger pause loop was not updated to call the
DisplayServer version, there are currently no implementations of
OS::process_and_drop_events() so i removed it and switched to the new
DisplayServer::force_process_and_drop_events() method.