- gl: Do not call makeCurrent every flip - it is already called in set_current()
- gl: Improve ring buffer behaviour; use sliding window to view buffers larger than maximum viewable hardware range
NV hardware can only view 128M at a time
- gl/vk: Bump transform constant heap size When lots of draw calls are issued, the heap is exhaused very fast (8k per draw)
- gl: Remove CLIENT_STORAGE_BIT from ring buffers. Performance is marginally better without this flag (at least on windows)
- Do not bother rechecking the dirty sampler pool for hits. Its faster to create new sampler than to search the pool
- Reserve some memory on vertex layout struct to reduce reallocation penalty
- Adds support for abstract implementations
- Adds native windowing implementations for WIN32 and X11 as fallbacks
when present support is lacking (headless configs)
- For some reason the hardware forgets that primitive restart is enabled and tries to actually read vertex index 65535
- Works correctly if uint32 vertex indices are used instead of uint16 for cases where primitive restart is active
- Fix for texture barriers
- vulkan: Rework texture cache handling of depth surfaces
- Support for scaled depth blit using overlay pass
- Support proper readback of D24S8 in both D32F_S8 and D24U_S8 variants
- Optimize the depth conversion routines with SSE
- vulkan: Replace slow single element copy with std::memcpy
- Check heap status before attempting blit operations
- Bump guard size on upload buffer as well
- Implement flush-always behaviour to partially fix readback from a currently bound fbo
- Without this, only the first read is correct, as more draws are added the results become 'wrong'
- Fixes WCB and cpublit behviour
- Synchronize blit_dst surfaces to avoid data loss when gpu texture scaling is used
- Its still faster in such cases to disable gpu texture scaling but some types cannot be disabled without force cpu blit (e.g framebuffer transfers)
- Memory management tuning
- rsx: on-demand texture cache rescanning for unprotected sections
- rsx: Only framebuffer resources are upscaled
- Do not resize regular blit engine resources
- Lazy initialize readback buffer when using opengl
-- These measures should help minimize vram usage
Six instructions changed to use xmm registers instead of gpr.
ROTQBII, ROTQMBII, SHLQBII look better (shifts by imm)
ROTQBI, ROTQMBI, SHLQBI changed for consistency (shifts by variable)