With render target HLE, directly store linear values as R16G16B16A16_UNORM
without gamma conversion, as this format provides more than enough bits
(need at least 11 per component due to the maximum scale being 2^3 in the
piecewise linear gamma curve) to represent linear values without precision
loss.
This makes blending work correctly in linear space, improving quality of
transparency, lighting passes, and fixing issues such as transparent parts
of impact and footstep decals in 4D5307E6 being bright instead.
The new behavior is enabled by default, as it hugely improves the accuracy
of emulation of this format, that is pretty commonplace in Xbox 360 games,
with likely just a small GPU memory and bandwidth usage increase, compared
to the alternatives that were previously available on the HLE RB path.
It's currently implemented only on Direct3D 12, as most of the current GPU
emulation code is planned to be phased out and redone, and no methods other
than 8-bit with pre-conversion were implemented on Vulkan previously.
To implement on Vulkan later, same conversion as in the Direct3D 12
implementation will need to be done in ownership transfer and resolve
shaders. Currently it's somewhat inconvenient to decouple the conversion
functions in `SpirvShaderTranslator` from an instance of the translator due
to vector constant usage. Later, simpler SPIR-V generation functions may be
added (`spv::Builder` usage in general is overly verbose).
The previously default method (8-bit storage with pre-conversion in shaders
and incorrect blending) can be re-enabled by setting the
"gamma_render_target_as_unorm16" configuration option to `false`. This may
be useful if the game, for instance, switches between 8_8_8_8_GAMMA and
8_8_8_8 formats for the same data frequently, as switching will result in
EDRAM range ownership transfer data copying now. Also, the old path is
preserved for Vulkan devices not supporting R16G16B16A16_UNORM with
blending.
The other workaround that was available previously, replacing the PWL
encoding with host hardware sRGB with linear-space blending in render
target management and in texture fetching, was also inherently inaccurate
in many ways (especially when games have their own PWL encoding math, like
4541080F that displayed incorrect colors on the loading screen), and
required tracking of the encoding needed for ranges in the memory.
The sRGB workaround therefore was deleted in this commit, greatly
simplifying the code in the parts of render target, texture and memory
management and shader generation that were involved in it.
The 32_32_32_FLOAT format seems to be vertex-only, so it looks like there
can't be storage elements smaller than a single texel.
So, use a more precise name that can't be confused with "picture element"
(pixel) or "texture element" (texel) that represents a single logical pixel
rather than a storage block of pixels.
Switch between even and odd 16-byte element sequences along X by simply
flipping a bit rather than going to a different resolution-scaled group of
pixels, by increasing the size of the group within the constraints imposed
by tiling.
`spirv-remap` is not present in modern Vulkan SDK versions, it was replaced
with the `--canonicalize-ids` pass in `spirv-opt`.
Overall, canonicalization provides a significant compression improvement,
which is important considering that currently Xenia is distributed in a ZIP
archive and contains many very similar shaders.
With normal DEFLATE compression, canonicalization reduced the size of a ZIP
with `xenia.exe` from 3.54 MB to 3.45 MB in a test done before committing.
Also disable stripping of debug information from shaders, which apparently
was among what `spirv-remap` was doing with `--do-everything`, as binding
and uniform buffer member names heavily aid in debugging in RenderDoc.
Partially integrated from #2329.
Co-authored-by: Herman S. <429230+has207@users.noreply.github.com>
Co-authored-by: Gliniak <Gliniak93@gmail.com>
Replace the `SubmissionTracker`s with new `GPUCompletionTimeline`s with a
more unified interface (using a base class), and without the internal logic
for queue ownership transfers since that idea was scrapped during the
development of the `Presenter`.
Also use this fence management logic for GPU emulation, though without
architectural reworks for now, just on the bottom level.
Still very messy, but can be cleaned up in further GPU command processor
and presenter reworks.
Use the _xe suffix instead of the xesl_ prefix for quicker visual
recognition of identifiers, also switch to snake_case for consistency.
Also add the f suffix to float32 literals because the Metal Shading
Language is based on C++.
Enable portability subset physical device enumeration.
Don't use Vulkan 1.1+ logical devices on Vulkan 1.0 instances due to the
VkApplicationInfo::apiVersion specification.
Make sure all extension dependencies are enabled when creating a device.
Prefer exposing feature support over extension support via the device
interface to avoid causing confusion with regard to promoted extensions
(especially those that required some features as extensions, but had those
features made optional when they were promoted).
Allow creating presentation-only devices, not demanding any optional
features beyond the basic Vulkan 1.0, for use cases such as internal tools
or CPU rendering.
Require the independentBlend feature for GPU emulation as working around is
complicated, while support is almost ubiquitous.
Move the graphics system initialization fatal error message to xenia_main
after attempting to initialize all implementations, for automatic fallback
to other implementations in the future.
Log Vulkan driver info.
Improve Vulkan debug message logging, enabled by default.
Refactor code, with simplified logic for enabling extensions and layers.
VS/PS_NUM_REG is 6-bit on Adreno 200, and games aren't seen using the
bit 7 to indicate that no GPRs are used. It's not clear why Freedreno
configures it this way.
Some texture fetch fields were deprecated or moved during the development
of the Xenos, reflect that in the comments.
Add definitions of the registers configuring the conversion of vertex
positions to fixed-point. Although there isn't much that can be done with
it when emulating using PC GPU APIs, there are some places in Xenia that
wrongly (though sometimes deliberately, for results closer to the behavior
of the host GPU) assume that the conversion works like in Direct3D 10+,
however the Xenos supports only up to 4 subpixel bits rather than 8. The
effects of this difference are largely negligible, though.
Also add more detailed info about register references and differences from
other ATI/AMD GPUs for potential future contributors.
Simplifies emission of the blocks themselves (including inserting blocks
into the function's block list in the correct order), as well as phi after
the branching.
Also fixes 64bpp storing with blending in the fragment shader interlock
render backend implementation (had a typo that caused the high 32 bits to
overwrite the low ones).
C++ relational operators are supposed to raise FE_INVALID if an argument is
NaN, use std::isless/greater[equal] instead where they were easy to locate
(though there are other places possibly, mostly min/max and clamp usage was
checked).
Also fixes a copy-paste error making the CPU shader interpreter execute
MINs as MAXs instead.
Hopefully prevents some potential #1971-like situations.
WAIT_REG_MEM's implementation also allowed the compiler to load the value
only once, which caused an infinite loop with the other changes in the
commit (even in debug builds), so it's now accessed as volatile. Possibly
it would be even better to replace it with some (acquire/release?) atomic
load/store some day at least for the registers actually seen as
participating in those waits.
Also fixes the endianness being handled only on the first wait iteration in
WAIT_REG_MEM.
Accessing the same memory as different types (other than char) using
reinterpret_cast or a union is undefined behavior that has already caused
issues like #1971.
Also adds a XE_RESTRICT_VAR definition for declaring non-aliasing pointers
in performance-critical areas in the future.
Functional changes:
- Enable only actually used features, as drivers may take more optimal
paths when certain features are disabled.
- Support VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_EDGE.
- Fix the separateStencilMaskRef check doing the opposite.
- Support shaderRoundingModeRTEFloat32.
- Fix vkGetDeviceBufferMemoryRequirements pointer not passed to the Vulkan
Memory Allocator.
Stylistic changes:
- Move all device extensions, properties and features to one structure,
especially simplifying portability subset feature checks, and also making
it easier to request new extension functionality in the future.
- Remove extension suffixes from usage of promoted extensions.