Although 2D draws in painters order with strict ordering, in certain circumstances items can be reordered to increase batching / decrease state changes, without affecting the end result. This can be determined by an overlap test.
In situation with item:
A-B-A
providing the third item does not overlap the second, they can be reordered:
A-A-B
Items already contain an AABB which can be used for this overlap test.
1)
To utilise this, I have implemented item reordering (only for single rects for now), with the lookahead adjustable in project settings. This can increase performance in situations where items may not be grouped in the scene tree by texture. It can also be switched off (by setting lookahead to 0).
2)
This same trick can be used to help join items that are lit. Lit items previously would prevent joining completely, thus missing out on performance gains other than multi-command items such as tilemaps.
In this PR, lights are assigned as bits in a bitfield (up to 64, the optimization is disabled above this), and on each try_item (for joining), the bitfield for lights and shadows is constructed and compared with the previous items. If these match the 2 items can potentially be joined. However, this can only be done without changing the rendered result if an overlap test is successful.
This overlap test can be adjusted to join items up to a specific number of item references, selectable in project settings, or turned off.
3)
The legacy uniform single rect drawing routine seems to have been identified as the source of flicker, particularly on nvidia. However, it can also be up to 2x as fast. Because of the speed the batching contains a fallback where it can use the legacy single rect method, but I have now added a project setting to make this switchable. In most cases with batching it should not be necessary (as single rects are drawn less frequently) and thus the flickering can be totally avoided.
4)
This PR also fixes a color modulate bug when drawing light passes, in certain situations (particularly custom _draw routines with multiple rects).
5)
This PR also fixes#38291, a bug in the legacy renderer where light passes could draw rects in wrong position.
- Resurrect it for GL ES 2
- Apply roll over with `fmod()` instead of resetting it to 0
- Expose the setting from the `VisualServer`, since it does not belong in any specific rasterizer
This should make headless exporting work in projects using textures
in any format.
Error messages should no longer appear when running a project
that used image formats that were previously not present in the list.
(cherry picked from commit 3007c7e2a3)
When reading SCREEN_TEXTURE in a shader, this previously only worked succesfully for the first read of the screen, because state.canvas_texscreen_used was never getting reset. This PR resets state.canvas_texscreen_used at the beginning of each joined item, so that further screen reads can happen.
Joining items across z_indices can interfere with light culling for lights which only affect certain z ranges. This PR disables joining across z_indices when lights are present, except specifically for lights with both z_min set to the global minimum (-4096) and z_max set to the global maximum (4096).
In addition, the z_index is now stored on the joined_item for accurate light culling. The z_index is also displayed in frame diagnostics.
In rare circumstances default batches were being joined incorrectly, causing visual regressions. This logic has been fixed.
In addition slightly more output information has been added to frame diagnosis mode.
Batching across z_index layers was not preserving the batch_break flag, which determines whether to not join the previous item. This is fixed by storing the flag in RenderItemState and preserving it across canvas_render_items calls.
Added project setting to enable / disable print frame diagnostics every 10 seconds. This prints out a list of batches and info, which is useful to optimize games and identify performance problems.
This adds 2 new values (items and draw calls) to the performance monitor in a '2d' section, rather than reusing the 3d values in the 'raster' section.
This makes it far easier to optimize games to minimize drawcalls.
Defers sending 'transform' commands within a RasterizerCanvas::Item until they are needed for default batches. Instead locally caches the extra matrix and applies it using software transform, preventing unnecessary batch breaks.
The logic is relatively complex, and the whole 'extra matrix' of the legacy renderer in addition to the final_transform is not ideal. However this is required to accelerate some user drawing techniques, and later the lines in the IDE.
Extra functions canvas_render_items_begin and canvas_render_items_end are added to RasterizerCanvas, with noop stubs for non-GLES2 renderers. This enables batching to be spready over multiple z_indices, and multiple calls to canvas_render_items.
It does this by only performing item joining within canvas_render_items, and deferring rendering until canvas_render_items_end().
Determined that a large reason for the decrease in performance in unbatchable scenes was due to the new routine being analogous to the 'nvidia workaround' code, that is about half the speed. So this simply uses the old routine in the case of single unbatchable rects. Hopefully we will be able to remove the old path at a later stage.
Where the final_modulate color varies between render_items this can prevent batching. This PR solves this by baking final_modulate into the vertex colors, and setting the uniform 'final_modulate' to white, and allowing the joining of items that have different final_modulate values. The previous batching system can then cope with vertex color changes as normal.
2d rendering is currently bottlenecked by drawing primitives one at a time, limiting OpenGL efficiency. This PR batches primitives and renders in fewer drawcalls, resulting in significant performance improvements. This also speeds up text rendering.
This PR batches across canvas items as well as within items.
The code dynamically chooses between a vertex format with and without color, depending on the input data for a frame, in order to optimize throughput and maximize batch size. It also adds an option to use glScissor to reduce fillrate in light passes.
Namely, move the drive dropdown to just the left of the path text box and don't include the former
in the latter.
This improves the UX on Windows.
In the UNIX case, since its concept of drives is (ab)used to provide shortcuts to useful paths, its
dropdown is kept at the original location.
Pith bend message now has correct size (was 2 bytes instead of 3).
Recognized (but not implemented) 0xF? messages. SysEx messages will be reocognized as such, but their contents will be ignored.
Fixes#26637.
Fixes#19900.
The viewport_size returned by get_viewport_size was previously incorrect, being half the correct value. The function is renamed to get_viewport_half_extents, and now returns a Vector2.
Code which called this function has also been modified accordingly.
This PR also fixes shadow culling when using ortho cameras, because the correct input for CameraMatrix::set_orthogonal should be the full HEIGHT from get_viewport_half_extents, and not half the width.
It also fixes state.ubo_data.viewport_size in rasterizer_scene_gles3.cpp to be the width and the height of the viewport in pixels as stated in the documentation, rather than the current value which is half the viewport extents in worldspace, presumed to be a bug.
Reverts the following commits:
- c81ec6f26d:
"Exposes capture methods to AudioServer, variable renames for
consistency, added documentation."
- 47c558b98a:
"Expose audio callbacks as signals."
- dabaa11b3c:
"Fix to make sure the capture buffers are deallocated at shutdown.
Silences warnings."
Some documentation improvements were kept for pre-existing methods.
See rationale for reverting these changes in #30468.
All the calculations leading up to `mipLevel` are only relevant for
Panorama mode. Similarly, the `source_resolution` uniform is only
needed for that mode.
`ERROR: _display_error_with_code: CanvasShaderGLES3: Fragment Program Compilation Failed:
0:166(2): error: `return' with wrong type int, in function `map_ninepatch_axis' returning float` caused by #34704
Some cases were not handled properly for Polygon2D after making changes in common code to fix Line2D antialiasing. Added an option for drawing polygons to differentiate the two use cases.
Fixes#34568
Happy new year to the wonderful Godot community!
We're starting a new decade with a well-established, non-profit, free
and open source game engine, and tons of further improvements in the
pipeline from hundreds of contributors.
Godot will keep getting better, and we're looking forward to all the
games that the community will keep developing and releasing with it.
The new 'split_libmodules=yes' option is useful to work around linker
command line size limitations when linking a huge number of objects.
We're currently over 64k chars when linking libmodules.a on Windows
with MinGW, which triggers issues as seen in #30892.
Even on Linux, we can also reach linker command line size limitations
by adding more custom modules.
We force this option to True for MinGW on Windows, which fixes#30892.
Additional changes to lib splitting:
- Fix linking of the split module libs with interdependent symbols,
hacking our way into LINKCOM and SHLINKCOM to set the `--start-group`
and `--end-group` flags.
- Fix Python 3 compatibility in `methods.split_lib()`.
- Drop seemingly obsolete condition for 'msys' on 'posix'.
- Drop the unnecessary 'split_drivers' as the drivers lib is no longer
too big since we moved all thirdparty builds to modules.
Co-authored-by: Hein-Pieter van Braam-Stewart <hp@tmm.cx>
Polygon2D:
The property wasn't used anymore after switching from canvas_item_add_polygon() to canvas_item_add_triangle_array() for drawing.
Line2D:
Added the same property as for Polygon2D & fixed smooth line drawing to use indices correctly.
Fixes#26823
This changes the code path so that `glRenderBufferStorage*` always uses
values appropriate for renderbuffers and `glTexImage2D` never uses an
internalformat meant for buffers.
Fixes#33825.
As discussed in #32657, this can't be done here as lines can be used
with a canvas scale, and this breaks them.
A suggestion is to do the pixel shifting at matrix level instead.
Fixes#33393.
Fixes#33421.
When rendering to an external texture and MSAA was active (as happened
in the Oculus Mobile ARVR plugin) no MSAA was rendered as the correct
depth buffer and multisample texture target was not used.
This also fixes https://github.com/GodotVR/godot_oculus_mobile/issues/54
The misterious windows networking stack...
Using connect instead of WSAConnect causes socket error 10022 under
certain conditions.
See: https://github.com/godotengine/webrtc-native/ (issue 6)
Having to guess, code path for connect is different then WSAConnect with
NULL extra parameters.
The only reference about weird error with this code mentions something
called "Windows Filtering Platform" but windows internals are, as
always, obscure.
This might be something to try and report to Microsoft if anyone has the
time to spare with the likely outcome of being ignored.
While OpenGL ES 3.0 and WebGL 2.0 both support non power-of-2 (NPOT)
textures in their specification, the situation seems to be less clear
about *compressed* NPOT textures using repeat or mipmap flags.
At least Chrome on Linux doesn't seem to support this combination,
and a variety of mobile hardware have similar limitations.
As a workaround, we force decompressing such textures when running on
WebGL 2.0, at the cost of loading time and memory usage.
Fixes#33058.
OpenGL uses the diamond exit rule to rasterize lines. If we don't shift
the points down and to the right by 0.5, the line can sometimes miss a
pixel when it shouldn't. The final fragment of a line isn't drawn. By
drawing the lines clockwise, we can avoid a missing pixel in the rectangle.
See section 3.4.1 in the OpenGL 1.5 specification.
Fixes#32279
On Unix systems, file descriptors are usually shared among child
processes.
This means, that if we spawn a subprocess (or we fork) like we do in
the editor any open file descriptor will leak to the new process.
This PR sets the close-on-exec flag when opening a file, which causes
the file descriptor to not be shared with the child process.
On Unix systems, sockets are like file descriptors, and file descriptors
are usually shared among child processes.
This means, that if we spawn a subprocess (or we fork) like we do in the
editor, open file descriptors will leak to the new process.
This causes issue with sockets as they might remain open and bound
(listening) when the original process closes.
Fixes this error:
```
drivers\unix\ip_unix.cpp(155): error C2593: 'operator =' is ambiguous
.\core/ustring.h(177): note: could be 'void String::operator =(const CharType *)'
.\core/ustring.h(176): note: or 'void String::operator =(const char *)'
drivers\unix\ip_unix.cpp(155): note: while trying to match the argument list '(String, int)'
```
This fixes an issue that was fixed for gles3 in #31419 but not applied
to gles2. The fix consists of using a constant scale for cube_normal of -1.0
instead of -1000000. It results in broken panorama rendering on the
oculus quest (see https://github.com/GodotVR/godot_oculus_mobile/issues/29)
Although the backup USE_SKELETON_SOFTWARE skinning path is currently used when float texture is not supported, the default skinning path still fails when float texture is supported but GL_MAX_VERTEX_TEXTURE_IMAGE_UNITS is 0, i.e. the device cannot read from texture during vertex shader. This PR adds the logic to activate the SKELETON_SOFTWARE path if either of these cases occur, preventing crashes on devices which have this combination of features.
In 2.1 and 3.0, light_vec could be modified for altering shadow_computations.
But it broke shadows when rotating light. shadow_vec would do the same, but without breaking
shadows in rotated lights if not used.
Add inverse light transformation to shadow vec, so it's not affected when rotating lights;
Added usage define for shadow vec.
For shadow vec working properly when rotating a light, it's needed to multiply it by light_matrix normalized. Added usage define in order to don't do that if shadow_vec not used.
Changed the behaviour of the Linear tonemapping operator to not clamp to [0, 1] range
in the case when KEEP_3D_LINEAR is defined. This allows to render values > 1.0 in
floating point texture targets (via Viewport) for further processing or saving high
dynamic range data into files. This only works when no color conversion is active.
The last remaining ERR_EXPLAIN call is in FreeType code and makes sense as is
(conditionally defines the error message).
There are a few ERR_EXPLAINC calls for C-strings where String is not included
which can stay as is to avoid adding additional _MSGC macros just for that.
Part of #31244.
Condensed some if and ERR statements. Added dots to end of error messages
Couldn't figure out EXPLAINC. These files gave me trouble: core/error_macros.h, core/io/file_access_buffered_fa.h (where is it?),
core/os/memory.cpp,
drivers/png/png_driver_common.cpp,
drivers/xaudio2/audio_driver_xaudio2.cpp (where is it?)
This makes height fog appear at the bottom of the scene
(instead of the top), which is generally the expected result.
This also tweaks the fog height setting hint to be more flexible.
This closes#30709.