- Handles all LODs per layer meaning cubemaps are now fully handled in 6 passes instead of 6 * (log2(width)) passes.
- Handles all LODs of a 3D texture in one pass as well.
- The improvements do warrant dropping down the number of allowed compute invocations a bit
subresource_layout::dim_in_texel
- These two are not always linked when working with compressed textures.
The actual texels extend past the actual size of the image if the size
is not aligned. e.g if height is 1, the real height is 4, but its not
possible to determine this from the aligned size. It could be 1, 2, 3 or
4 for example.
- Fixes image out-of-bounds writes when uploading from CPU
- Calculate exact sizes when doing hit tests to avoid false negatives
- Defer page checking until actually require to do memory setup
- Introduce align2 helper to do non-pow2 alignments
- Load into memory as straightforward BGRA
- Fixes a bug in vulkan caused by byte shuffling in blit engine vs shader access
- Removes the need for memory shuffling when transferring into a rendertarget
- Further improve aliased data preservation by unconditionally scanning.
Its is possible for cache aliasing to occur when doing memory split.
- Also sets up for RCB/RDB implementation
- Removes CPU-only transforms that broke GPU-side code.
-- Channels in GPU compute are laid out in cell-order, but CPU was uploading in favorable order and compensating with swizzles.
-- This leads to 2 different layouts depending on the location of the data (CPU vs GPU)
- Implement R8G8_R8B8 interleaved format decode
- General improvements
formats
- Allows D24S8 and D32S8 transport via typeless channels
- Allows uploading and downloading D24S8 data easily
- TODO: Implement optional byteswapping to fix flushed readbacks with
the same method
- Blit operations do format conversion automatically which is NOT what we want!
- Scale onto temp buffer with similar format before performing data cast.
- Disable DEPTH<->RGBA typeless transfers for now as they require a lot more work to work for all vendors
- Do not allow switching layouts to UNDEFINED/PREINITIALIZED formats
- gl: Include an execution state wrapper to ensure state changes are consistent. Also removes a lot of required 'cleanup' for helper methods
- texture_cache: Make execition context a mandatory field as it is required for all operations. Also removes a lot of situations where duplicate argument is added in for both fixed and vararg fields
- Explicit read/write barrier for framebuffer resources depending on
usage. Allows for operations like optional memory initialization before
reading
- Implicitly invoke a memory barrier if actively reading from an unsynchronized texture
- Simplify memory transfer operations
- Should allow more games to work without strict mode
- Unroll main compute queue loop
- Do NOT run GPU cores on mappable memory! This has a dreadful impact on performance for obvious reasons
- Enable dynamic SSBO indexing (affects AMD)
- Make loop unrolling and loop length variable depending on hardware and find optimum
- Used to transfer D32S8 data where it makes sense to use this variant
- On nvidia cards, it is very slow to move aspects from D24S8 probably due to the format being faked.
For this reason, the unsafe variant is used for both D16 and D24S8 to avoid the heavy performance loss
- Compute is now used to assist in some parts of blit operations, since there are no format conversions with vulkan like OGL does
- TODO: Integrate this into all types of GPU memory conversion operations instead of downloading to CPU then converting
- Removes the old depth scaling using an overlay.
It was never going to work properly due to per-pixel stencil writes being unavailable
- TODO: Preserve stencil buffer during ARGB8->D32S8 shader conversion pass
- vk: Clear dirty textures before copying 'old contents' in case the old data does not fill the new region
- rsx: Properly decode border color - seems to be in BGRA format
- vk: better approximation of border color to better choose between the presets
- vk: Individually clear color images outside render pass and without scissor
- vk: Fix renderpass selection for clear overlay pass
- vk: Include scissor region when emulating clear mask
NOTES:
- vk: Completely avoid using vkClearXXXXimage - its 'broken' on nvidia drivers
Spec is vague about the function so its not an actual bug
ClearAttachment is clearly defined as bypassing bound state which works correctly
- TODO: Implement memory sampling to simulate loading precleared memory if cell used memset to preinitialize the framebuffer
Autoclear depth to 1|255 and color to 0 is hacky!
- gl/vk: Fix subresource copy/blit
- gl/vk: Fix default_component_map reading
- vk: Reimplement cell readback path and improve software channel decoder
- Properly name the subresource layout field - its in blocks not bytes!
- Implement d24s8 upload from memory correctly
- Do not ignore DEPTH_FLOAT textures - they are depth textures and abide by the depth compare rules
- NOTE: Redirection of 16-bit textures is not implemented yet