- Handle typeless src and dst with aliased typeless format
- Optimize typeless transfers by only dealing with affected texels.
* Eliminates redundant dst->typeless transfer of full image (very expensive)
* Eliminates full src->typeless transfer of full image and replaces with only affected region
* Requires significantly smaller output buffers, saving on VRAM cost
- Adds the same optimization/simplification steps to complex image
transfer routines. Whenever possible, multi-step transfers are collapsed
into a single operation.
- Interpolating floats is not the same as interpolating their bits!
Use integer format to interpolate linearly for D32F formats instead of using R32F as intermediary
- Handles all LODs per layer meaning cubemaps are now fully handled in 6 passes instead of 6 * (log2(width)) passes.
- Handles all LODs of a 3D texture in one pass as well.
- The improvements do warrant dropping down the number of allowed compute invocations a bit
subresource_layout::dim_in_texel
- These two are not always linked when working with compressed textures.
The actual texels extend past the actual size of the image if the size
is not aligned. e.g if height is 1, the real height is 4, but its not
possible to determine this from the aligned size. It could be 1, 2, 3 or
4 for example.
- Fixes image out-of-bounds writes when uploading from CPU
- Calculate exact sizes when doing hit tests to avoid false negatives
- Defer page checking until actually require to do memory setup
- Introduce align2 helper to do non-pow2 alignments
- Load into memory as straightforward BGRA
- Fixes a bug in vulkan caused by byte shuffling in blit engine vs shader access
- Removes the need for memory shuffling when transferring into a rendertarget
- Further improve aliased data preservation by unconditionally scanning.
Its is possible for cache aliasing to occur when doing memory split.
- Also sets up for RCB/RDB implementation
- Removes CPU-only transforms that broke GPU-side code.
-- Channels in GPU compute are laid out in cell-order, but CPU was uploading in favorable order and compensating with swizzles.
-- This leads to 2 different layouts depending on the location of the data (CPU vs GPU)
- Implement R8G8_R8B8 interleaved format decode
- General improvements
formats
- Allows D24S8 and D32S8 transport via typeless channels
- Allows uploading and downloading D24S8 data easily
- TODO: Implement optional byteswapping to fix flushed readbacks with
the same method