- With harmonization between all texture types implemented, there is no difference between blit_engine_src and shader_read for supported formats
- Adds extra format filtering to ensure no conflicts when copying data
- This allows creating buffers with no MAP bits set which should ensure they are created for VRAM usage only
- TODO: Implement compute kernels to avoid software fallback mode for pack/unpack operations
- Properly commit orphaned blocks not invalidating existing cache structures
- Do not ignore overwritten objects when commiting as unprotected fbo. Avoids stale references to invalidated surface objects.
- Implements render target data load (aka Read Color Buffer/Read Depth Buffer)
- Refactors vulkan surface barrier to be much cleaner.
- Removes redundant surface barrier invocations after doing a merged load
from surface cache.
- Adds explicit access modes when gathering surfaces from cache.
- Further improve aliased data preservation by unconditionally scanning.
Its is possible for cache aliasing to occur when doing memory split.
- Also sets up for RCB/RDB implementation
- Texel borders are no longer actually supported in modern APIs
- Removes the border texels and uses border color instead which is incorrect but should work fine
vm::spu max address was overflowing resulting in issues, so cast to u64 where needed. Fixes#6145.
Use vm::get_addr instead of manually substructing vm::base(0) from pointer in texture cache code.
Prefer std::atomic_thread_fence over _mm_?fence(), adjust usage to be more correct.
Used sequantially consistent ordering in semaphore_release for TSX path as well.
Improved memory ordering for sys_rsx_context_iounmap/map.
Fixed sync bugs in HLE gcm because of not using atomic instructions.
Use release memory barrier in lwsync for PPU LLVM, according to this xbox360 programming guide lwsync is a hw release memory barrier.
Also use release barrier where lwsync was originally used in liblv2 sys_lwmutex and cellSync.
Use acquire barrier for isync instruction, see https://devblogs.microsoft.com/oldnewthing/20180814-00/?p=99485
- Ensures the current renderpass matches the image properties even when a cyclic reference is detected
- Solves SDK debug output error spam due to mismatching layouts and renderpasses
- Transition attachments to LAYOUT_GENERAL in case of a feedback loop
- Fixes appearance of garbage along polygon edges in some
post-processing passes.
- Also reverse this transition when rendering goes back to normal
- Allows render targets to behave like stacked 3D views same as shader inputs are resolved
- Basically implements most of 'Read Color/Depth Buffers" option for 'free'.
- Allows splitting RTV/DSV resources if they are superceded by a partial surface
- Also allows intersecting new resources through the surface cache for proper inheritance from other scattered data
- TODO: Refactor bind_surface_as_rtt and bind_surface_as_ds to reduce asinine code duplication
- When reverse scanning, offsets are inverted and offset value of 0 is logically equivalent to an offset of -1
- Add an explicit message if clipping happens to avoid silent errors/bugs
- Removes CPU-only transforms that broke GPU-side code.
-- Channels in GPU compute are laid out in cell-order, but CPU was uploading in favorable order and compensating with swizzles.
-- This leads to 2 different layouts depending on the location of the data (CPU vs GPU)
- Implement R8G8_R8B8 interleaved format decode
- General improvements
- Do not round up sub-pixel offsets, round down instead
- Do not allow incomplete sources for hw blit transfer
- Reimplement src clipping (slice_h)
- Check 'area' of incoming texels and correct for them before RTT lookup/transfer
- Filter out incomplete targets when performing RTT lookup (1 texel or less contribution)
- If a transfer writes to a RTT and depth mismatch happens, create a local target and the upload function will likely resolve between the two
- If a surface is rejected, reset the target region!
- Also refactors some bpp handling code
- Simplify texture intersection test to use a normalized/uniform coordinate space
- Fix broken bounds checking as well
- Batch dma transfers whenever possible and do them in one go
- vk: Always ensure that queued dma transfers are visible to the GPU before they are needed by the host
Requires a little refactoring to allow proper communication of the commandbuffer state
- vk: Code cleanup, the simplified mechanism makes it so that its not necessary to pass tons of args to methods
- vk: Fixup - do not forcefully do dma transfers on sections in an invalidation zone! They may have been speculated correctly already
- Properly wait for the buffer transfer operation to finish before map/readback!
- Change vkFence to vkEvent which works more like a GL fence which is what is needed.
- Implement supporting methods and functions
- Do not destroy fence by immediately waiting after copying to dma buffer
- Avoids blindly reusing blit dst sections as they may contain garbage.
If a section was unlocked for a flush, just discard it as its reuse introduces potential data corruption.
Since the data needs to be reuploaded anyway (for now), its better to start afresh
- In case of format mismatch, reset the calculated dst block
- Add a bounds check to determine if data contained in an atlas is good enough for sampling the cache.
If not enough data is provided, fall back to full upload
- Avoid tagging and rely on read/write barriers and the dirty flag mechanism. Testing is done with a weak 8-byte memory test
- Introducing new data when tagging breaks applications with race conditions where tags can overwrite flushed data
- gl: Include an execution state wrapper to ensure state changes are consistent. Also removes a lot of required 'cleanup' for helper methods
- texture_cache: Make execition context a mandatory field as it is required for all operations. Also removes a lot of situations where duplicate argument is added in for both fixed and vararg fields
- Explicit read/write barrier for framebuffer resources depending on
usage. Allows for operations like optional memory initialization before
reading
- If draw call resources consume memory that intersects with NA parts of the texture cache, we get a framebuffer test mismatch.
This mismatch is false and happens because the thread has not yet reached the point of relocking the pages
- Implicitly invoke a memory barrier if actively reading from an unsynchronized texture
- Simplify memory transfer operations
- Should allow more games to work without strict mode
- Do not bind companion framebuffer when clearing single aspect; let the
contest mechanism sort it out instead
- Do not prematurely tag framebuffers, instead only do so at
write-confirmation time. Should avoid false tagging if setup does not
allow a render to occur.
To avoid the need (and performance hit) of Read Color/Depth Buffers, we
may not invalidate overlapping fbos inside lock_memory_region unless
they are guaranteed to be superseded by the new one.
This avoids e.g. issues with overblooming, among others.
Fixes VRAM leaks and incorrect destruction of resources, which could
lead to drivers crashes.
Additionally, lock_memory_region is now able to flush superseded
sections. However, due to the potential performance impact of this
for little gain, a new debug setting ("Strict Flushing") has been
added to config.yaml
- Orders flushing to preserve memory at all cost
- Avoids false positive where flushing overlapping sections can falsely invalidate another with head/tail test
- Forcefully downloads and reuploads data from the CPU in case of unexpected overlaps
- Properly detect correct size of newly created blit targets
- Remember to clear any existing views when changing the default component map!
- Retag resources reprotected under flush_always rules
- Properly check for blit resource fitting taking into account format
mismatch, pitch mismatch and typeless transfers
- Tags framebuffer resources on first use (when on_write is called to verify memory)
- Texture cache now selects the best match and even sorts atlas writes with memory write order to avoid older data showing over newer one
invalidate_range_impl_base does not mark all textures that will only be
unprotected as dirty when doing a deferred flush, since that is done by
flush_all.
However, if there are no sections to flush, the deferred flush will
use the same code path as non-deferred flushes for unprotecting textures
and forget to mark them as dirty.
This commit fixes this bug.
The existing implementation restarts the loop immediately after
finding a range_data instance that updates the trampled_range.
This commit refactors this method to continue the loop with the updated
trampled_range, and then repeat only those range_data instances that
were iterated through before the trampled_range was last updated.
As a result, the number of total iterations required is reduced.
When the trampled range changes, get_intersecting_set restarts the
outer loop. However, due to an off-by-one error, it skips the first
cache entry when doing so. This can cause a texture not to be
correctly unlocked, which could lead to issues or even deadlocks.
This commit fixes this off-by-one error.
- Avoid re-locking memory if there is no reason to do so (no draws issued)
- Actively bound regions should always get written to the backing cache
- Forcefully read memory during download if writes to the target have occured since last sync event
- Some applications (e.g Backbreaker) use an evil hack to resolve MSAA.
The application respecifies a formerly AA region as a region with no AA then performs a framebuffer feedback lookup.
The old memory keeps AA during read, but writes back to itself with AA resolved.
This is evil on several levels but it just happens to work on PS3
1. rsx: Rework section synchronization using the new memory mirrors
2. rsx: Tweaks
- Simplify peeking into the current rsx::thread instance.
Use a simple rsx::get_current_renderer instead of asking fxm for the same
- Fix global rsx super memory shm block management
3. rsx: Improve memory validation. test_framebuffer() and
tag_framebuffer() are simplified due to mirror support
4. rsx: Only write back confirmed memory range to avoid overapproximation errors in blit engine
5. rsx: Explicitly mark clobbered flushable sections as dirty to have them
removed
6. rsx: Cumulative fixes
- Reimplement rsx::buffered_section management routines
- blit engine subsections are not hit-tested against confirmed/committed memory range
Not all applications are 'honest' about region bounds, making the real cpu range useless for blit ops
- gl/vk: Fix subresource copy/blit
- gl/vk: Fix default_component_map reading
- vk: Reimplement cell readback path and improve software channel decoder
- Properly name the subresource layout field - its in blocks not bytes!
- Implement d24s8 upload from memory correctly
- Do not ignore DEPTH_FLOAT textures - they are depth textures and abide by the depth compare rules
- NOTE: Redirection of 16-bit textures is not implemented yet
- Ignore unlocked blit sections [TODO]
- Do not attempt blit on hw if bytesize is unsupported
- gl: Implement typeless memory transfers
Uses pbo to handle type-agnostic memory transfer
- Reimplements render target views used for sampling
- Optimizes access using an encoded control token
- Adds proper encoding for 24-bit textures (DRGB8 -> ORGB/OBGR)
- Adds proper encoding for ABGR textures (ABGR8 -> ARGB8)
- Silence some compiler warnings as well
- TODO: Real texture views for OGL current method is a hack
- Separate TXB from TXL: They are completely different!
- Properly perform TMU emulation in the fragment shader. Implemens SRGB conversion and alphakill at the moment
- Properly perform ROP emulation in the fragment shader. Implements FRAMEBUFFER_SRGB. While support on the chip looks to be incomplete (and wierd), it does work
- Document some more bits in SHADER_CONTROL register
- Forces Bitcast of texture data if input format cannot possibly be the
same as the existing texture format
- rsx: Other minor improvements to texture cache :-
- remove obsolete blit engine incompatibility warning. The texture will be re-uploaded if it is indeed incompatible
- Implement warn_once and err_once to avoid spamming the log with systemic errors
- Track mispredicted flushes
- Reswizzle bitcasted texture data to native layout
TODO: Also needs reshuffle according to input remap vector
- Identify depth textures reaching the gpu via shader_read upload path
- Use correct timestamp counter for opengl
- inline draw_state::test_property because msvc doesnt do it for us
- Implement flush-always behaviour to partially fix readback from a currently bound fbo
- Without this, only the first read is correct, as more draws are added the results become 'wrong'
- Fixes WCB and cpublit behviour
- Synchronize blit_dst surfaces to avoid data loss when gpu texture scaling is used
- Its still faster in such cases to disable gpu texture scaling but some types cannot be disabled without force cpu blit (e.g framebuffer transfers)
- Memory management tuning
- rsx: on-demand texture cache rescanning for unprotected sections
- rsx: Only framebuffer resources are upscaled
- Do not resize regular blit engine resources
- Lazy initialize readback buffer when using opengl
-- These measures should help minimize vram usage
- Do not assume texture2D when creating new textures
- Flag invalid texture cache if readonly texture is trampled by fbo memory.
Avoids binding a stale handle to the pipeline and is rare enough that it should not hurt performance
- Force mipmap count to 1 if sampling from an RTV/DSV
- TODO: Better wcb flush detection, it should be better to re-upload the texture after it has been dwnloaded if expected mipmaps are > 1
- Sometimes square renders are done to surfaces with pitch=64 and re-uploaded with swizzle scanning
-- This setup avoids discarding targets if they are square and pitch == 64
- Abort nv406e semaphore acquire if the rsx thread stalls/crashes
- Fix texture size approximation to take mipmaps into account. Fixes some games hanging with WCB
- Avoid unprotecting memory until just before we have to write the data
- Avoids race conditions where the caller thread takes too long to enter the second phase and another thread accesses the "bad" memory
- Discard intentionally invalidated framebuffer resources. These are created after a flush has happened, forcing reupload since contents cannot be guaranteed (strict mode only)
- Fix for blits using vulkan; dont use the copy method if formats do not match, use generic blit instead
- Fix texture cache blit behaviour when src has AA enabled and dst is a blit dst texture with or without AA
-- This requires handling AA resolve by removing a half downscale on multisampled axes
- Return all ones when a vertex attribute is disabled.
-- Some games forget to enable vertex attributes actually needed by the fs
- texture_cache: Fix internal size calculation for subresources
- vk: Delay dynamic state updates until just about to draw to ensure no flush has discarded the cb state
- Support for raster offsets in surface descriptors (looks to be unused)
- Do not tag disabled render targets when using MRT (pitch = 64)
- Add missing notify_surface_changed() call for openGL
- Use AA mode to predict surface compression. Compression mode is useless without AA activated
- Rewrites most image subresource fetch routines to use the new heuristic
- Fix rsx:🧵:find_tile. FEED000(X) can be substituted for (X) in the code
-- Fixes alot of failures when looking for tiled regions
rsx: Fix antialiased unnormalized coords
- scaling factors are inverse to allow proper coordinates to be computed in fs
- Reimplement fragment program fetch and rewrite texture upload mechanism
-- All of these steps should only be done at most once per draw call
-- Eliminates continously checking the surface store for overlapping addresses as well
addenda - critical fixes
- gl: Bind TIU before starting texture operations as they will affect the currently bound texture
- vk: Reuse sampler objects if possible
- rsx: Support for depth resampling for depth textures obtained via blit engine
vk/rsx: Minor fixes
- Fix accidental imageview dereference when using WCB if texture memory occupies FB memory
- Invalidate dirty framebuffers (strict mode only)
- Normalize line endings because VS is dumb
to flush_all is up to date. Should prevent recursive exceptions
Partially revert Jarves' fix to invalidate cache on tile unbind. This will
need alot more work. Fixes hangs
- This allows for better handling of deferred flushes.
-- There's still no guarantee that cache contents will have changed between the set acquisition and following flush operation
-- Hopefully this is rare enough that it doesnt cause serious issues for now
- vk: Always reopen primary command buffers. They should only be closed in flush_command_queue
- If uploading a texture and there are collisions with protected buffers, do not rebuild the cache
- Perform writes via flush before reprotecting pages that were not trampled
- Only flush no pages once
- Refactor invalidate memory functions into one function
- Add cached object rebuilding functionality to avoid throwing away useful memory on an invalidate
- Added debug monitoring of texture unit VRAM usage