As a result of the GLES specifications being vague about best practice for how buffers should be used dynamically, different GPUs / platforms appear to have different preferences.
Mac in particular seems to have a number of problems in this area, and none of the rendering team uses Macs. So far we have relied on guesswork to choose the best usage, but in an attempt to pin this down, this PR begins to introduce manual selection of options for users to test their configurations.
In small batches using hardware transform, vertices would be drawn in incorrect positions due to the item transform being applied twice - once in the transform uniform, and once from the transform passed as a vertex attribute.
This PR alters the shader to ignore uniform transforms when using large FVF.
Due to my less than eagle-like view over these functions I had assumed they were passing in a single buffer input for the changes to make buffer uploading more efficient. They aren't, which is less than ideal.
So these particular changes should be reverted. When I have some more time I'll see whether the API for these calls can be changed, because as is the multiple glSubBufferData calls could be causing stalls on some hardware.
It can be enabled in the Project Settings
(`rendering/quality/filters/use_debanding`). It's disabled
by default as it has a small performance impact and can make
PNG screenshots much larger (due to how dithering works).
As a result, it should be enabled only when banding is noticeable enough.
Since debanding requires a HDR viewport to work, it's only supported
in the GLES3 backend.
This is part of effort to make more efficient use of the API for devices with poor drivers. This eliminates multiple calls to glBufferSubData per update.
Batching is mostly separated into a common template which can be used with multiple backends (GLES2 and GLES3 here). Only necessary specifics are in the backend files.
Batching is extended to cover more primitives.
Don't apply lighting to objects when they have a lightmap texture and
the light is set to BAKE_ALL. This prevents applying the same direct
light twice on the same object and makes setting up scenes with mixed
lighting much easier.
Option in MeshInstance to enable software skinning, in order to test
against the current USE_SKELETON_SOFTWARE path which causes problems
with bad performance.
Co-authored-by: lawnjelly <lawnjelly@gmail.com>
- Fade reflection towards inner margin and clip it at screen edges instead of external margin.
- Round edges of the fade margin if both are being cut off to prevent sharp corners.
When not using TEXTURE_RECT path, flips have to sent via another method to the shader, to ensure that normal maps are correctly adjusted for direction. This PR adds an extra vertex attribute, LIGHT_ANGLE.
For nvidia workarounds, where the shader still has access to the final transform and extra matrix, the LIGHT_ANGLE can be 0 (no adjustment), 180 degrees for a horizontal flip, and negative indicates a vertical flip.
For batching path, the LIGHT_ANGLE can be used to directly specify the light angle for normal mapping, even when the final transform and extra matrix have been baked into vertex positions, so the same shader can be used for both.
Configured for a max line length of 120 characters.
psf/black is very opinionated and purposely doesn't leave much room for
configuration. The output is mostly OK so that should be fine for us,
but some things worth noting:
- Manually wrapped strings will be reflowed, so by using a line length
of 120 for the sake of preserving readability for our long command
calls, it also means that some manually wrapped strings are back on
the same line and should be manually merged again.
- Code generators using string concatenation extensively look awful,
since black puts each operand on a single line. We need to refactor
these generators to use more pythonic string formatting, for which
many options are available (`%`, `format` or f-strings).
- CI checks and a pre-commit hook will be added to ensure that future
buildsystem changes are well-formatted.
(cherry picked from commit cd4e46ee65)
Affects per-pixel transparency
The current method renders to the screen by copying the GLES output to a
DIB for transparency using the CPU instead of rendering directly to the
window via the GPU. This is slower and also forces the window to be borderless
as WS_EX_LAYERED affects the non-client region as well.
This change uses DWMEnableBlurBehindWindow which allows using the standard
glClearColor() background alpha and is also performed through the GPU,
eliminating CPU bottlenecks
- Resurrect it for GL ES 2
- Apply roll over with `fmod()` instead of resetting it to 0
- Expose the setting from the `VisualServer`, since it does not belong in any specific rasterizer