- Tags framebuffer resources on first use (when on_write is called to verify memory)
- Texture cache now selects the best match and even sorts atlas writes with memory write order to avoid older data showing over newer one
- Some formats are proven to ignore swizzle flag
- DXT compressed textures
- COMPRESSED_BG_GB class textures
- Some applications are using swizzled wide integer formats so those are confirmed to swizzle
invalidate_range_impl_base does not mark all textures that will only be
unprotected as dirty when doing a deferred flush, since that is done by
flush_all.
However, if there are no sections to flush, the deferred flush will
use the same code path as non-deferred flushes for unprotecting textures
and forget to mark them as dirty.
This commit fixes this bug.
The existing implementation restarts the loop immediately after
finding a range_data instance that updates the trampled_range.
This commit refactors this method to continue the loop with the updated
trampled_range, and then repeat only those range_data instances that
were iterated through before the trampled_range was last updated.
As a result, the number of total iterations required is reduced.
When the trampled range changes, get_intersecting_set restarts the
outer loop. However, due to an off-by-one error, it skips the first
cache entry when doing so. This can cause a texture not to be
correctly unlocked, which could lead to issues or even deadlocks.
This commit fixes this off-by-one error.
- Defer compilation process to worker threads
- vulkan: Fixup for graphics_pipeline_state.
Never use struct assignment operator on vk** structs due to padding after sType member (4 bytes)
- Adds proper support for vertex textures, including dimensions other than 2D textures
- Minor analyser fixup, removes spurious 'analyser failed' errors
- Minor optimizations for program state tracking
- Adds dead code elimination
- Fix absolute branch target addresses to take base address into account
- Patch branch targets relative to base address to improve hash matching
- Bumps shader cache version
- Enables shader logging option to write out vertex program binary,
helpful when debugging problems.
- Fix program abort logic to never abort before resolving later label addresses
Fixes jumping over broken code and jumping over END markers
- TEXTURE_CONTROL2 has indexing range of [0..15] without stride skipping!
This register does not have interleaving with other texture registers
- Track shader address poke as it seems to invalidate programs as well
- Avoid re-locking memory if there is no reason to do so (no draws issued)
- Actively bound regions should always get written to the backing cache
- Forcefully read memory during download if writes to the target have occured since last sync event
- Compute is now used to assist in some parts of blit operations, since there are no format conversions with vulkan like OGL does
- TODO: Integrate this into all types of GPU memory conversion operations instead of downloading to CPU then converting
- Some applications (e.g Backbreaker) use an evil hack to resolve MSAA.
The application respecifies a formerly AA region as a region with no AA then performs a framebuffer feedback lookup.
The old memory keeps AA during read, but writes back to itself with AA resolved.
This is evil on several levels but it just happens to work on PS3
1. rsx: Rework section synchronization using the new memory mirrors
2. rsx: Tweaks
- Simplify peeking into the current rsx::thread instance.
Use a simple rsx::get_current_renderer instead of asking fxm for the same
- Fix global rsx super memory shm block management
3. rsx: Improve memory validation. test_framebuffer() and
tag_framebuffer() are simplified due to mirror support
4. rsx: Only write back confirmed memory range to avoid overapproximation errors in blit engine
5. rsx: Explicitly mark clobbered flushable sections as dirty to have them
removed
6. rsx: Cumulative fixes
- Reimplement rsx::buffered_section management routines
- blit engine subsections are not hit-tested against confirmed/committed memory range
Not all applications are 'honest' about region bounds, making the real cpu range useless for blit ops
- Improve dirty state tracking affecting program state
- vk: Refactor out transform constants upload into a separate channel to avoid if possible
transform data uploads are quite expensive
- Introduces a gpu program analyser step to examine shader contents before attempting compilation or cache search
- Avoids detecting shader as being different because of unused textures having state changes
- Adds better program size detection for vertex programs
- Improved vertex program decompiler
- Properly support CAL type instructions
- Support jumping over instructions marked with a termination marker with BRA/CAL class opcodes
- Fix SRC checks and abort
- Fix CC register initialization
- NOTE: Even unused SRC registers have to be valid (usually referencing in.POS)
- gl/vk: Fix subresource copy/blit
- gl/vk: Fix default_component_map reading
- vk: Reimplement cell readback path and improve software channel decoder
- Properly name the subresource layout field - its in blocks not bytes!
- Implement d24s8 upload from memory correctly
- Do not ignore DEPTH_FLOAT textures - they are depth textures and abide by the depth compare rules
- NOTE: Redirection of 16-bit textures is not implemented yet
- Ignore unlocked blit sections [TODO]
- Do not attempt blit on hw if bytesize is unsupported
- gl: Implement typeless memory transfers
Uses pbo to handle type-agnostic memory transfer
- Reimplements render target views used for sampling
- Optimizes access using an encoded control token
- Adds proper encoding for 24-bit textures (DRGB8 -> ORGB/OBGR)
- Adds proper encoding for ABGR textures (ABGR8 -> ARGB8)
- Silence some compiler warnings as well
- TODO: Real texture views for OGL current method is a hack
- Separate TXB from TXL: They are completely different!
- Properly perform TMU emulation in the fragment shader. Implemens SRGB conversion and alphakill at the moment
- Properly perform ROP emulation in the fragment shader. Implements FRAMEBUFFER_SRGB. While support on the chip looks to be incomplete (and wierd), it does work
- Document some more bits in SHADER_CONTROL register
- Export some debug information in the free texture register space components zw
Very useful when analysing renderdoc captures
- Enable shadow comparison on depth as long as compare function is active and texture is uploaded for depth read
Some engines (UE3) read all the components in the shader and use mul/mad with the result
- According to NV_fragment_program spec, registers are zero initialized always
- A program even without writing to these registers will have black (0, 0, 0, 0) output
Confirmed behaviour with MotorStorm games. Their engine uses this quirk to clear color buffers when doing depth replace
Might be an unfixed game bug
- Forces Bitcast of texture data if input format cannot possibly be the
same as the existing texture format
- rsx: Other minor improvements to texture cache :-
- remove obsolete blit engine incompatibility warning. The texture will be re-uploaded if it is indeed incompatible
- Implement warn_once and err_once to avoid spamming the log with systemic errors
- Track mispredicted flushes
- Reswizzle bitcasted texture data to native layout
TODO: Also needs reshuffle according to input remap vector
- A new step is added between decompilation and pipeline object creation allowing for properties to be updated based on shader contents
- Allos masking off attachment writes that are unmodified in the shader
- Identify depth textures reaching the gpu via shader_read upload path
- Use correct timestamp counter for opengl
- inline draw_state::test_property because msvc doesnt do it for us
- Fix for texture barriers
- vulkan: Rework texture cache handling of depth surfaces
- Support for scaled depth blit using overlay pass
- Support proper readback of D24S8 in both D32F_S8 and D24U_S8 variants
- Optimize the depth conversion routines with SSE
- vulkan: Replace slow single element copy with std::memcpy
- Check heap status before attempting blit operations
- Bump guard size on upload buffer as well
- Implement flush-always behaviour to partially fix readback from a currently bound fbo
- Without this, only the first read is correct, as more draws are added the results become 'wrong'
- Fixes WCB and cpublit behviour
- Synchronize blit_dst surfaces to avoid data loss when gpu texture scaling is used
- Its still faster in such cases to disable gpu texture scaling but some types cannot be disabled without force cpu blit (e.g framebuffer transfers)
- Memory management tuning
- rsx: on-demand texture cache rescanning for unprotected sections
- rsx: Only framebuffer resources are upscaled
- Do not resize regular blit engine resources
- Lazy initialize readback buffer when using opengl
-- These measures should help minimize vram usage
- Do not assume texture2D when creating new textures
- Flag invalid texture cache if readonly texture is trampled by fbo memory.
Avoids binding a stale handle to the pipeline and is rare enough that it should not hurt performance
- Do not add unused subroutines in shaders unless necessary
-- makes shaders easier to read and disassembled spir-v has less clutter
- glsl: Replace switch block with lookup table
- Use edges of depth range to map clamped stuff
Disable range compression on regular draws vs extended range draws
- Some applications require full 0-1 usage without compromises.
-- TODO: This leaves the extended range z values to fight with regular draws in the .99 - 1.0 range
- The scale offset matrix is fine but on real hardware the z results seem to be independent of near/far clipping distances
-- If depth falls within near/far, clamp depth value to [0,1]
- Force mipmap count to 1 if sampling from an RTV/DSV
- TODO: Better wcb flush detection, it should be better to re-upload the texture after it has been dwnloaded if expected mipmaps are > 1
- Sometimes square renders are done to surfaces with pitch=64 and re-uploaded with swizzle scanning
-- This setup avoids discarding targets if they are square and pitch == 64
- Abort nv406e semaphore acquire if the rsx thread stalls/crashes
- Fix texture size approximation to take mipmaps into account. Fixes some games hanging with WCB
- Avoid unprotecting memory until just before we have to write the data
- Avoids race conditions where the caller thread takes too long to enter the second phase and another thread accesses the "bad" memory
- Reorganize storage hash vs ucode hash
- Scan for actual fragment program start in case leading NOPed code precedes the actual instructions
-- e.g FEAR2 Demo has over 32k of padding before actual program code that messes up hashes
- Discard intentionally invalidated framebuffer resources. These are created after a flush has happened, forcing reupload since contents cannot be guaranteed (strict mode only)
- Fix for blits using vulkan; dont use the copy method if formats do not match, use generic blit instead
- Handle aliased depth + color target by disabling depth writes. This looks to be the correct way
- Add support for generic passes that cannot be done using general imaging operations. Lays the framework for tons of features and effects
- Implement RGBA->D24D8 casting. Sometimes games will split depth texture into RGBA8 then use the new RGBA8 as a depth texture directly
-- This happens alot in ps3 games and I'm not sure why. Its likely the ps3 did not sample fp values with linear filtering so this is a workaround
-- Only implemented for openGL at the moment
-- Requires a workaround for an AMD driver bug
- Fix texture cache blit behaviour when src has AA enabled and dst is a blit dst texture with or without AA
-- This requires handling AA resolve by removing a half downscale on multisampled axes
- Return all ones when a vertex attribute is disabled.
-- Some games forget to enable vertex attributes actually needed by the fs
- texture_cache: Fix internal size calculation for subresources
- vk: Delay dynamic state updates until just about to draw to ensure no flush has discarded the cb state
- Support for raster offsets in surface descriptors (looks to be unused)
- Do not tag disabled render targets when using MRT (pitch = 64)
- Add missing notify_surface_changed() call for openGL
- Use AA mode to predict surface compression. Compression mode is useless without AA activated
- Rewrites most image subresource fetch routines to use the new heuristic
- Fix rsx:🧵:find_tile. FEED000(X) can be substituted for (X) in the code
-- Fixes alot of failures when looking for tiled regions
rsx: Fix antialiased unnormalized coords
- scaling factors are inverse to allow proper coordinates to be computed in fs