- For some reason the hardware forgets that primitive restart is enabled and tries to actually read vertex index 65535
- Works correctly if uint32 vertex indices are used instead of uint16 for cases where primitive restart is active
- Fix for texture barriers
- vulkan: Rework texture cache handling of depth surfaces
- Support for scaled depth blit using overlay pass
- Support proper readback of D24S8 in both D32F_S8 and D24U_S8 variants
- Optimize the depth conversion routines with SSE
- vulkan: Replace slow single element copy with std::memcpy
- Check heap status before attempting blit operations
- Bump guard size on upload buffer as well
- Implement flush-always behaviour to partially fix readback from a currently bound fbo
- Without this, only the first read is correct, as more draws are added the results become 'wrong'
- Fixes WCB and cpublit behviour
- Synchronize blit_dst surfaces to avoid data loss when gpu texture scaling is used
- Its still faster in such cases to disable gpu texture scaling but some types cannot be disabled without force cpu blit (e.g framebuffer transfers)
- Memory management tuning
- rsx: on-demand texture cache rescanning for unprotected sections
- rsx: Only framebuffer resources are upscaled
- Do not resize regular blit engine resources
- Lazy initialize readback buffer when using opengl
-- These measures should help minimize vram usage
Six instructions changed to use xmm registers instead of gpr.
ROTQBII, ROTQMBII, SHLQBII look better (shifts by imm)
ROTQBI, ROTQMBI, SHLQBI changed for consistency (shifts by variable)
Only AVX-512 path is changed (third version).
This instruction is extremely rare.
And the code is probably not optimal.
So this commit is pretty useless.
* Fix gs_frame spawning on a screen other than the one the RPCS3 window is on for multi-monitor setups
* Cleaned up code & refactored it into a utility function for reuse
* Qt: take gs_frame's framemargins into account by using showEvent
- For some surfaces, dimensions are passed via the log2 bits rather than surface pitch
-- This is similar to the setup for nv406e and probably means the surfaces are padded and swizzled
- Do not assume texture2D when creating new textures
- Flag invalid texture cache if readonly texture is trampled by fbo memory.
Avoids binding a stale handle to the pipeline and is rare enough that it should not hurt performance
- Do not add unused subroutines in shaders unless necessary
-- makes shaders easier to read and disassembled spir-v has less clutter
- glsl: Replace switch block with lookup table
- vulkan: Do not assume an aux frame context must exist in a well defined state as set in init_buffers() since the request might be external (via overlays path)
- gl: Do not bother waiting for idle before servicing external flip requests
- gl: Queue overlay cleanup requests to ensure only glthread attempts touching the context
- overlays: Do not compute size metrics for invalid/unsupported glyphs
- opengl driver optimization for nvidia. On nvidia glTextureBufferRange performance is horrendous
-- Initialize texture buffer to whole buffer at startup and use absolute offsets to read data instead
-- Over 2x performance in some cases (Resogun, TNT racers)
- gl/vk: Do not flip non-existent display buffers. Fixes spec violation at boot in TNT racers demo
- whitespace fixes for sys_rsx
- Implement low bit decode override flags for 2-component textures
- Properly implement alot of texture remaps according to the autotest results
rsx: Do not unnecessarily shuffle WZYX->RGBA unless we have proof
- From looking at format swizzles, this is incorrect
- Use edges of depth range to map clamped stuff
Disable range compression on regular draws vs extended range draws
- Some applications require full 0-1 usage without compromises.
-- TODO: This leaves the extended range z values to fight with regular draws in the .99 - 1.0 range
- The scale offset matrix is fine but on real hardware the z results seem to be independent of near/far clipping distances
-- If depth falls within near/far, clamp depth value to [0,1]
- Always flush the primary queue and wait if not involking readback from rsx thread
-- Should fix some instances of device_lost when using WCB
-- Marked remaining case as TODO
-- TODO: optimize amount of time rsx waits for external threads trying to read