Round-to-nearest integral based division, optimized for unsigned integral.
Used in sceNpTrophyGetGameProgress.
Do not allow signed values for aligned_div(), align().
- Adhere to workgroup count limits as exposed by the GPU vendor.
They already execute properly even when going beyond the limits but this removes validation noise.
- Fix invocation counts for deswizzle kernel. The count was incorrect if blocksize was not 4, causing a bunch of useless work to be done.
- Handles all LODs per layer meaning cubemaps are now fully handled in 6 passes instead of 6 * (log2(width)) passes.
- Handles all LODs of a 3D texture in one pass as well.
- The improvements do warrant dropping down the number of allowed compute invocations a bit
- Remove use of uniform buffers for compute static data. Use push
constants instead.
- Minor touchups to the deswizzle code to avoid redundant data copies.
- Add double-buffered descriptor pools to avoid use-after-free situations
- Make descriptor pools more configurable
- Also adds in a hack to allow renderdoc to capture properly
formats
- Allows D24S8 and D32S8 transport via typeless channels
- Allows uploading and downloading D24S8 data easily
- TODO: Implement optional byteswapping to fix flushed readbacks with
the same method
- Unroll main compute queue loop
- Do NOT run GPU cores on mappable memory! This has a dreadful impact on performance for obvious reasons
- Enable dynamic SSBO indexing (affects AMD)
- Make loop unrolling and loop length variable depending on hardware and find optimum
- Used to transfer D32S8 data where it makes sense to use this variant
- On nvidia cards, it is very slow to move aspects from D24S8 probably due to the format being faked.
For this reason, the unsafe variant is used for both D16 and D24S8 to avoid the heavy performance loss
- Compute is now used to assist in some parts of blit operations, since there are no format conversions with vulkan like OGL does
- TODO: Integrate this into all types of GPU memory conversion operations instead of downloading to CPU then converting