With the octahedral compression, we had attributes of a size of 2 bytes
which potentially caused performance regressions on iOS/Mac
Now add padding to the normal/tangent buffer
For octahedral, normal will always be oct32 encoded
UNLESS tangent exists and is also compressed
then both will be oct16 encoded and packed into a vec4<GL_BYTE>
attribute
Initial octahedral compression incorrectly wrote tangent to the buffer
using an offset of 3 rather than 4, losing the sign of the tangent
vector needed for things like tangent space for texturing mapping
GLES3 renderer used remove_custom_define rather than set_conditional to
change back to the default conditional state the scene shader should be
in
Implement Octahedral Compression for normal/tangent vectors
*Oct32 for uncompressed vectors
*Oct16 for compressed vectors
Reduces vertex size for each attribute by
*Uncompressed: 12 bytes, vec4<float32> -> vec2<unorm16>
*Compressed: 2 bytes, vec4<unorm8> -> vec2<unorm8>
Binormal sign is encoded in the y coordinate of the encoded tangent
Added conversion functions to go from octahedral mapping to cartesian
for normal and tangent vectors
sprite_3d and soft_body meshes write to their vertex buffer memory
directly and need to convert their normals and tangents to the new oct
format before writing
Created a new mesh flag to specify whether a mesh is using octahedral
compression or not
Updated documentation to discuss new flag/defaults
Created shader flags to specify whether octahedral or cartesian vectors
are being used
Updated importers to use octahedral representation as the default format
for importing meshes
Updated ShaderGLES2 to support 64 bit version codes as we hit the limit
of the 32-bit integer that was previously used as a bitset to store
enabled/disabled flags
Implemented splitting of vertex positions and attributes in the vertex
buffer
Positions are sequential at the start of the buffer, followed by the
additional attributes which are interleaved
Made a project setting which enables/disabled the buffer formatting
throughout the project
Implemented in both GLES2 and GLES3
This improves performance particularly on tile-based GPUs as well as
cache performance for something like shadow mapping which only needs
position data
Updated Docs and Project Setting
This is an older, easier to implement variant of CAS as a pure
fragment shader. It doesn't support upscaling, but we won't make
use of it (at least for now).
The sharpening intensity can be adjusted on a per-Viewport basis.
For the root viewport, it can be adjusted in the Project Settings.
Since `textureLodOffset()` isn't available in GLES2, there is no
way to support contrast-adaptive sharpening in GLES2.
We've been using standard C library functions `memcpy`/`memset` for these since
2016 with 67f65f6639.
There was still the possibility for third-party platform ports to override the
definitions with a custom header, but this doesn't seem useful anymore.
Backport of #48239.
Allows users to override default API usage, in order to get best performance on different platforms.
Also changes the default legacy flags to use STREAM rather than DYNAMIC.
- Fix objects with no material being considered as fully transparent by the lightmapper.
- Added "environment_min_light" property: gives artistic control over the shadow color.
- Fixed "Custom Color" environment mode, it was ignored before.
- Added "interior" property to BakedLightmapData: controls whether dynamic capture objects receive environment light or not.
- Automatically update dynamic capture objects when the capture data changes (also works for "energy" which used to require object movement to trigger the update).
- Added "use_in_baked_light" property to GridMap: controls whether the GridMap will be included in BakedLightmap bakes.
- Set "flush zero" and "denormal zero" mode for SSE2 instructions in the Embree raycaster. According to Embree docs it should give a performance improvement.
Changes default ninepatch mode to preserve compatibility, and renames default mode to 'fixed'.
Also adds an editor restart to changing ninepatch mode and software skinning, which will be more user friendly.
The rendering/quality/2d section of project settings is becoming considerably expanded in 3.2.4, and arguably was not the correct place for settings that were not really to do with quality.
3.2.4 is the last sensible opportunity we will have to move these settings, as the only existing one likely to break compatibility in a small way is `pixel_snap`, and given that the whole snapping area is being overhauled we can draw attention to the fact it has changed in the release notes.
Class reference is also updated and slightly improved.
`pixel_snap` is renamed to `gpu_pixel_snap` in the project settings and code to help differentiate from CPU side transform snapping.
Move definition of rendering/quality/filters/anisotropic_filter_level to
servers/visual_server.cpp, since both GLES2 and GLES3 now use it
rasterizer_storage_gles3.cpp: Remove a spurious variable write (the
value gets overwritten soon after)
- Fix Embree runtime when using MinGW (patch by @RandomShaper).
- Fix baking of lightmaps on GridMaps.
- Fix some GLSL errors.
- Fix overflow in the number of shader variants (GLES2).
Completely re-write the lightmap generation code:
- Follow the general lightmapper code structure from 4.0.
- Use proper path tracing to compute the global illumination.
- Use atlassing to merge all lightmaps into a single texture (done by @RandomShaper)
- Use OpenImageDenoiser to improve the generated lightmaps.
- Take into account alpha transparency in material textures.
- Allow baking environment lighting.
- Add bicubic lightmap filtering.
There is some minor compatibility breakage in some properties and methods
in BakedLightmap, but lightmaps generated in previous engine versions
should work fine out of the box.
The scene importer has been changed to generate `.unwrap_cache` files
next to the imported scene files. These files *SHOULD* be added to any
version control system as they guarantee there won't be differences when
re-importing the scene from other OSes or engine versions.
This work started as a Google Summer of Code project; Was later funded by IMVU for a good amount of progress;
Was then finished and polished by me on my free time.
Co-authored-by: Pedro J. Estébanez <pedrojrulez@gmail.com>
Happy new year to the wonderful Godot community!
2020 has been a tough year for most of us personally, but a good year for
Godot development nonetheless with a huge amount of work done towards Godot
4.0 and great improvements backported to the long-lived 3.2 branch.
We've had close to 400 contributors to engine code this year, authoring near
7,000 commit! (And that's only for the `master` branch and for the engine code,
there's a lot more when counting docs, demos and other first-party repos.)
Here's to a great year 2021 for all Godot users 🎆
(cherry picked from commit b5334d14f7)
These were only put in for the betas, in order to test hypotheses for stalling on Macs. It seems that most of the problems in the Mac editor have been solved by fixing the excessive redraw_requests.
As a result no one has reported any results from these options, but in future we will be able to refer users to try the beta versions, so there is no need to include them in the stable release. Indeed they are only likely to cause confusion.
As a result of the GLES specifications being vague about best practice for how buffers should be used dynamically, different GPUs / platforms appear to have different preferences.
Mac in particular seems to have a number of problems in this area, and none of the rendering team uses Macs. So far we have relied on guesswork to choose the best usage, but in an attempt to pin this down, this PR begins to introduce manual selection of options for users to test their configurations.
It can be enabled in the Project Settings
(`rendering/quality/filters/use_debanding`). It's disabled
by default as it has a small performance impact and can make
PNG screenshots much larger (due to how dithering works).
As a result, it should be enabled only when banding is noticeable enough.
Since debanding requires a HDR viewport to work, it's only supported
in the GLES3 backend.
Another bug in the octree has been discovered which can cause flickering in rare circumstances : #42895
For safety until this is fixed properly this PR reverts the default state of the octree to match the old behaviour, which doesn't appear exhibit the bug (or at least not as readily).
Batching is mostly separated into a common template which can be used with multiple backends (GLES2 and GLES3 here). Only necessary specifics are in the backend files.
Batching is extended to cover more primitives.
Don't apply lighting to objects when they have a lightmap texture and
the light is set to BAKE_ALL. This prevents applying the same direct
light twice on the same object and makes setting up scenes with mixed
lighting much easier.
Option in MeshInstance to enable software skinning, in order to test
against the current USE_SKELETON_SOFTWARE path which causes problems
with bad performance.
Co-authored-by: lawnjelly <lawnjelly@gmail.com>
Prevents adding new octants until a limiting number of elements have been added to the current octant. This enables balancing the benefits of brute force against the benefits of spatial partitioning. The limit can be set per octree.
Project settings are added for rendering octree to set the best balance per project depending on number of tests per frame / tick, and the amount of editing of the octree.
Fixes octants being leaked when removing elements.
Optimize octree with cached linear lists
Storing elements in octants using linked lists is efficient for housekeeping but very slow for testing. This optimization stores additional local_vectors with Element pointers and AABBs which are cached and only updated when a dirty flag is set on the octant.
This is selectable with 2 versions of Octree : Octree and Octree_CL, Octree being the old behaviour. At present the cached list version is only used for the visual server octree (rendering) as it has only been demonstrated to be faster there so far.
This uses slightly more memory (probably a few kb in most cases) but can be significantly faster during testing (culling etc).
Co-authored-by: Sergey Minakov <naithar@icloud.com>
Scaling tilemaps can cause border artifacts around the edges of tiles. This has been traced to precision issues in the GPU. This PR adds an adjustment to allow a minor contraction of the UVs of rects in order to compensate for the incorrect classification of texels across the UV border.
As it now seems like we will soon have GLES3 batching working using the same intermediate layer as GLES2, it makes more sense to reuse the same batching settings for both renderers rather than duplicate project settings for GLES2 and GLES3.
It seems that particles (and some other features) do not work correctly on iOS in GLES2 because either many of the devices do not support half float compression, or the GL constant used to reference it from Godot is incorrect.
This PR adds a project setting in rendering/gles2/ to disable half-float compression on iOS.
Although 2D draws in painters order with strict ordering, in certain circumstances items can be reordered to increase batching / decrease state changes, without affecting the end result. This can be determined by an overlap test.
In situation with item:
A-B-A
providing the third item does not overlap the second, they can be reordered:
A-A-B
Items already contain an AABB which can be used for this overlap test.
1)
To utilise this, I have implemented item reordering (only for single rects for now), with the lookahead adjustable in project settings. This can increase performance in situations where items may not be grouped in the scene tree by texture. It can also be switched off (by setting lookahead to 0).
2)
This same trick can be used to help join items that are lit. Lit items previously would prevent joining completely, thus missing out on performance gains other than multi-command items such as tilemaps.
In this PR, lights are assigned as bits in a bitfield (up to 64, the optimization is disabled above this), and on each try_item (for joining), the bitfield for lights and shadows is constructed and compared with the previous items. If these match the 2 items can potentially be joined. However, this can only be done without changing the rendered result if an overlap test is successful.
This overlap test can be adjusted to join items up to a specific number of item references, selectable in project settings, or turned off.
3)
The legacy uniform single rect drawing routine seems to have been identified as the source of flicker, particularly on nvidia. However, it can also be up to 2x as fast. Because of the speed the batching contains a fallback where it can use the legacy single rect method, but I have now added a project setting to make this switchable. In most cases with batching it should not be necessary (as single rects are drawn less frequently) and thus the flickering can be totally avoided.
4)
This PR also fixes a color modulate bug when drawing light passes, in certain situations (particularly custom _draw routines with multiple rects).
5)
This PR also fixes#38291, a bug in the legacy renderer where light passes could draw rects in wrong position.
- Resurrect it for GL ES 2
- Apply roll over with `fmod()` instead of resetting it to 0
- Expose the setting from the `VisualServer`, since it does not belong in any specific rasterizer
Added project setting to enable / disable print frame diagnostics every 10 seconds. This prints out a list of batches and info, which is useful to optimize games and identify performance problems.
This adds 2 new values (items and draw calls) to the performance monitor in a '2d' section, rather than reusing the 3d values in the 'raster' section.
This makes it far easier to optimize games to minimize drawcalls.
2d rendering is currently bottlenecked by drawing primitives one at a time, limiting OpenGL efficiency. This PR batches primitives and renders in fewer drawcalls, resulting in significant performance improvements. This also speeds up text rendering.
This PR batches across canvas items as well as within items.
The code dynamically chooses between a vertex format with and without color, depending on the input data for a frame, in order to optimize throughput and maximize batch size. It also adds an option to use glScissor to reduce fillrate in light passes.
Some cases were not handled properly for Polygon2D after making changes in common code to fix Line2D antialiasing. Added an option for drawing polygons to differentiate the two use cases.
Fixes#34568
Happy new year to the wonderful Godot community!
We're starting a new decade with a well-established, non-profit, free
and open source game engine, and tons of further improvements in the
pipeline from hundreds of contributors.
Godot will keep getting better, and we're looking forward to all the
games that the community will keep developing and releasing with it.
Polygon2D:
The property wasn't used anymore after switching from canvas_item_add_polygon() to canvas_item_add_triangle_array() for drawing.
Line2D:
Added the same property as for Polygon2D & fixed smooth line drawing to use indices correctly.
Fixes#26823
Make sure particles are processed during the same frame when visibility is set to on, in case they are still active from before and need to be restarted.
Fixed#33476
This is a new singleton where camera sources such as webcams or cameras on a mobile phone can register themselves with the Server.
Other parts of Godot can interact with this to obtain images from the camera as textures.
This work includes additions to the Visual Server to use this functionality to present the camera image in the background. This is specifically targetted at AR applications.
The bake mode property of lights previously didn't affect GI probes.
This change makes the GI probe ignore lights that have their bake mode
set to disabled.
There was a bug that could result in most bone aabb boxes ending up with
tiny size upon import and mess up with culling of skeletal meshes. This
fixes it.
Remove animation loop from ParticlesMaterial and move it to
SpatialMaterial for 3D particles and Particles2D for the 2D case.
Added animation to CPUParticles2D as well as the "Convert to
CPUParticles2D" to the PAarticles2D menu.
Initialise keep_original_textures and use_fast_texture_filter in storage
config. Removed any other variables from storage config that were both unused
and uninitialised to avoid future confusion (if they're needed it's
easier to spot an uninitialised variable problem in a PR that adds the
variable again rather than just uses it).
Copied storage Texture struct constructor from GLES3 implementation
(except where variables were already initialised with different values).
Gives us sensible tested defaults for previously uninitialised vars.
Added assignments for state.current_main_tex based on same in GLES3.
This allows more consistency in the manner we include core headers,
where previously there would be a mix of absolute, relative and
include path-dependent includes.