Complete rewrite of spatial partitioning using a bounding volume hierarchy rather than octree.
Switchable in project settings between using octree or BVH for rendering and physics.
It can be enabled in the Project Settings
(`rendering/quality/filters/use_debanding`). It's disabled
by default as it has a small performance impact and can make
PNG screenshots much larger (due to how dithering works).
As a result, it should be enabled only when banding is noticeable enough.
Since debanding requires a HDR viewport to work, it's only supported
in the GLES3 backend.
Another bug in the octree has been discovered which can cause flickering in rare circumstances : #42895
For safety until this is fixed properly this PR reverts the default state of the octree to match the old behaviour, which doesn't appear exhibit the bug (or at least not as readily).
Batching is mostly separated into a common template which can be used with multiple backends (GLES2 and GLES3 here). Only necessary specifics are in the backend files.
Batching is extended to cover more primitives.
Option in MeshInstance to enable software skinning, in order to test
against the current USE_SKELETON_SOFTWARE path which causes problems
with bad performance.
Co-authored-by: lawnjelly <lawnjelly@gmail.com>
Prevents adding new octants until a limiting number of elements have been added to the current octant. This enables balancing the benefits of brute force against the benefits of spatial partitioning. The limit can be set per octree.
Project settings are added for rendering octree to set the best balance per project depending on number of tests per frame / tick, and the amount of editing of the octree.
Fixes octants being leaked when removing elements.
Optimize octree with cached linear lists
Storing elements in octants using linked lists is efficient for housekeeping but very slow for testing. This optimization stores additional local_vectors with Element pointers and AABBs which are cached and only updated when a dirty flag is set on the octant.
This is selectable with 2 versions of Octree : Octree and Octree_CL, Octree being the old behaviour. At present the cached list version is only used for the visual server octree (rendering) as it has only been demonstrated to be faster there so far.
This uses slightly more memory (probably a few kb in most cases) but can be significantly faster during testing (culling etc).
Co-authored-by: Sergey Minakov <naithar@icloud.com>
- Use the `.log` file extension (recognized on Windows out of the box)
to better hint that generated files are logs. Some editors provide
dedicated syntax highlighting for those files.
- Use an underscore to separate the basename from the date and
the date from the time in log filenames. This makes the filename
easier to read.
- Keep only 5 log files by default to decrease disk usage in case
messages are spammed.
(cherry picked from commit 20af28ec06)
Scaling tilemaps can cause border artifacts around the edges of tiles. This has been traced to precision issues in the GPU. This PR adds an adjustment to allow a minor contraction of the UVs of rects in order to compensate for the incorrect classification of texels across the UV border.
As it now seems like we will soon have GLES3 batching working using the same intermediate layer as GLES2, it makes more sense to reuse the same batching settings for both renderers rather than duplicate project settings for GLES2 and GLES3.
Although 2D draws in painters order with strict ordering, in certain circumstances items can be reordered to increase batching / decrease state changes, without affecting the end result. This can be determined by an overlap test.
In situation with item:
A-B-A
providing the third item does not overlap the second, they can be reordered:
A-A-B
Items already contain an AABB which can be used for this overlap test.
1)
To utilise this, I have implemented item reordering (only for single rects for now), with the lookahead adjustable in project settings. This can increase performance in situations where items may not be grouped in the scene tree by texture. It can also be switched off (by setting lookahead to 0).
2)
This same trick can be used to help join items that are lit. Lit items previously would prevent joining completely, thus missing out on performance gains other than multi-command items such as tilemaps.
In this PR, lights are assigned as bits in a bitfield (up to 64, the optimization is disabled above this), and on each try_item (for joining), the bitfield for lights and shadows is constructed and compared with the previous items. If these match the 2 items can potentially be joined. However, this can only be done without changing the rendered result if an overlap test is successful.
This overlap test can be adjusted to join items up to a specific number of item references, selectable in project settings, or turned off.
3)
The legacy uniform single rect drawing routine seems to have been identified as the source of flicker, particularly on nvidia. However, it can also be up to 2x as fast. Because of the speed the batching contains a fallback where it can use the legacy single rect method, but I have now added a project setting to make this switchable. In most cases with batching it should not be necessary (as single rects are drawn less frequently) and thus the flickering can be totally avoided.
4)
This PR also fixes a color modulate bug when drawing light passes, in certain situations (particularly custom _draw routines with multiple rects).
5)
This PR also fixes#38291, a bug in the legacy renderer where light passes could draw rects in wrong position.
Added project setting to enable / disable print frame diagnostics every 10 seconds. This prints out a list of batches and info, which is useful to optimize games and identify performance problems.
2d rendering is currently bottlenecked by drawing primitives one at a time, limiting OpenGL efficiency. This PR batches primitives and renders in fewer drawcalls, resulting in significant performance improvements. This also speeds up text rendering.
This PR batches across canvas items as well as within items.
The code dynamically chooses between a vertex format with and without color, depending on the input data for a frame, in order to optimize throughput and maximize batch size. It also adds an option to use glScissor to reduce fillrate in light passes.
We already removed it from the online docs with #35132.
Currently it can only be "Built-In Types" (Variant types) or "Core"
(everything else), which is of limited use.
We might also want to consider dropping it from `ClassDB` altogether
in Godot 4.0.
On iOS devices without a physical home button iOS
shows a home indicator instead. This is often in the
way of the UI or the game.
Added a project setting to disable hidden home indicator.
The default value is to hide the home indicator
See https://github.com/godotengine/godot/pull/23658#issuecomment-562706669
The method was implemented back when Dictionary.get(key, default) did not
exist, but now that it does we do not need a custom method in CharFXTransform.
It's a new feature in 3.2, so does not break compat with 3.1.x.
The description is displayed as a tooltip when hovering the project
in the Project Manager. It can span multiple lines.
This partially addresses #8167.
This reproduces the behavior used for printing when using the remote
debugger. The default limit is 100 errors and 100 warnings per second,
which makes it possible to display much more GDScript warnings
before overflowing.
This also adds a "Too many warnings" message, so that warnings
don't look like errors when overflowing anymore.
This closes#21896.