- Matching attachments with resource id fails because drivers are reusing
handles!
- Properly sets up stale fbo ref counting and removal
- Properly sets up resource reference test with subsequent removal to
avoid using a broken fbo entry
- Orders flushing to preserve memory at all cost
- Avoids false positive where flushing overlapping sections can falsely invalidate another with head/tail test
- Do not do a full sync on a texture read barrier
- Avoid calling zcull sync in FIFO spin wait
- Do not flush memory to cache from the renderer side; this method is now obsolete
- Defer compilation process to worker threads
- vulkan: Fixup for graphics_pipeline_state.
Never use struct assignment operator on vk** structs due to padding after sType member (4 bytes)
- Adds proper support for vertex textures, including dimensions other than 2D textures
- Minor analyser fixup, removes spurious 'analyser failed' errors
- Minor optimizations for program state tracking
- Adds dead code elimination
- Fix absolute branch target addresses to take base address into account
- Patch branch targets relative to base address to improve hash matching
- Bumps shader cache version
- Enables shader logging option to write out vertex program binary,
helpful when debugging problems.
- Avoid re-locking memory if there is no reason to do so (no draws issued)
- Actively bound regions should always get written to the backing cache
- Forcefully read memory during download if writes to the target have occured since last sync event
- Removes the old depth scaling using an overlay.
It was never going to work properly due to per-pixel stencil writes being unavailable
- TODO: Preserve stencil buffer during ARGB8->D32S8 shader conversion pass
- Some applications (e.g Backbreaker) use an evil hack to resolve MSAA.
The application respecifies a formerly AA region as a region with no AA then performs a framebuffer feedback lookup.
The old memory keeps AA during read, but writes back to itself with AA resolved.
This is evil on several levels but it just happens to work on PS3
- The benefits of FIFO optimizations are huge in some cases.
The optimizations also do not break any tested applications so no need to disable with strict mode
- A debug option is provided to disable this behaviour for testing
- Improves cleanup code to consist of 2 parts, remove then dispose. Remove
does not deallocate the item until dispose is called on it, allowing the
backends to first deallocate external references.
- Caller is responsible for managing list locking and tracking disposable list
of items when external references have been cleaned up before using
dispose method.
- Reimplements the AMD workaround using an identity buffer to avoid the performance hit of doing multiple glDrawArrays for every single compiled set
- Reimplements first/count allocation using a scratch buffer to reduce allocation overhead when large number of draw calls is used
- Improve dirty state tracking affecting program state
- vk: Refactor out transform constants upload into a separate channel to avoid if possible
transform data uploads are quite expensive
- Introduces a gpu program analyser step to examine shader contents before attempting compilation or cache search
- Avoids detecting shader as being different because of unused textures having state changes
- Adds better program size detection for vertex programs
- Improved vertex program decompiler
- Properly support CAL type instructions
- Support jumping over instructions marked with a termination marker with BRA/CAL class opcodes
- Fix SRC checks and abort
- Fix CC register initialization
- NOTE: Even unused SRC registers have to be valid (usually referencing in.POS)
- Readback does not work at all with float textures on AMD openGL
Driver throws a bogus OUT_OF_MEMORY error regardless of amount of VRAM and system RAM available
- vk: Clear dirty textures before copying 'old contents' in case the old data does not fill the new region
- rsx: Properly decode border color - seems to be in BGRA format
- vk: better approximation of border color to better choose between the presets
- vk: Individually clear color images outside render pass and without scissor
- vk: Fix renderpass selection for clear overlay pass
- vk: Include scissor region when emulating clear mask
NOTES:
- vk: Completely avoid using vkClearXXXXimage - its 'broken' on nvidia drivers
Spec is vague about the function so its not an actual bug
ClearAttachment is clearly defined as bypassing bound state which works correctly
- TODO: Implement memory sampling to simulate loading precleared memory if cell used memset to preinitialize the framebuffer
Autoclear depth to 1|255 and color to 0 is hacky!
Primary:
- Fix SET_SURFACE_CLEAR channel mask - it has been wrong for all these
years! Layout is RGBA not ARGB/BGRA like other registers
Other Fixes:
- vk: Implement subchannel clears using overla pass
- vk: Simplify and clean up state management
- gl: Fix nullptr deref in case of failed subresource copy
- vk/gl: Ignore float buffer clears as hardware seems to do
- Mainly affected are colormasks and read swizzles
NOTES:
- Writes to G write to the second and fourth component (YW)
- Writes to B write to first and third component (XZ)
- This means the actual format layout is BGBG (RGBA) making RG mapping actually GR
- Clear does not seem to have any intended effect on this format (TLOU)
Fixes the following issues on Tales of Vesperia which requires SRM.
- Blacked out scene after the sleeping dog now renders correctly
- Ghosting effect. The ghosting was most noticeable as a delay between the character rendering and the cell shading around the character. This appears to be gone with this change.
- GL queries share the target binding (not asynchronous!)
- Discard active queries by closing them, leave closed queries alone (nothing to be done for discard op)
- Reimplements render target views used for sampling
- Optimizes access using an encoded control token
- Adds proper encoding for 24-bit textures (DRGB8 -> ORGB/OBGR)
- Adds proper encoding for ABGR textures (ABGR8 -> ARGB8)
- Silence some compiler warnings as well
- TODO: Real texture views for OGL current method is a hack
gl/vk: Bump shader cache version
gl/vk: Disable anisotropic override when strict mode enabled as it is proven to alter some games negatively
gl: Clamp buffer view range to not exceed the backing buffer size. Also add assert for the same condition
- Track asynchronous operations in RSX core
- Add read barriers to force pending writes to finish.
Fixes zcull delay flicker in all UE3 titles without forcing hard stall
- Increase zcull latency as all writes should be synchronized now
- ZCULL unit emulation rewritten
- ZCULL reports are now deferred avoiding pipeline stalls
- Minor optimizations; replaced std::mutex with shared_mutex where contention is rare
- Silence unnecessary error message
- Small improvement to out of memory handling for vulkan and slightly bump vertex buffer heap
- Forces Bitcast of texture data if input format cannot possibly be the
same as the existing texture format
- rsx: Other minor improvements to texture cache :-
- remove obsolete blit engine incompatibility warning. The texture will be re-uploaded if it is indeed incompatible
- Implement warn_once and err_once to avoid spamming the log with systemic errors
- Track mispredicted flushes
- Reswizzle bitcasted texture data to native layout
TODO: Also needs reshuffle according to input remap vector
- gl: Do not call makeCurrent every flip - it is already called in set_current()
- gl: Improve ring buffer behaviour; use sliding window to view buffers larger than maximum viewable hardware range
NV hardware can only view 128M at a time
- gl/vk: Bump transform constant heap size When lots of draw calls are issued, the heap is exhaused very fast (8k per draw)
- gl: Remove CLIENT_STORAGE_BIT from ring buffers. Performance is marginally better without this flag (at least on windows)
- Identify depth textures reaching the gpu via shader_read upload path
- Use correct timestamp counter for opengl
- inline draw_state::test_property because msvc doesnt do it for us
- Implement flush-always behaviour to partially fix readback from a currently bound fbo
- Without this, only the first read is correct, as more draws are added the results become 'wrong'
- Fixes WCB and cpublit behviour
- Synchronize blit_dst surfaces to avoid data loss when gpu texture scaling is used
- Its still faster in such cases to disable gpu texture scaling but some types cannot be disabled without force cpu blit (e.g framebuffer transfers)
- Memory management tuning
- rsx: on-demand texture cache rescanning for unprotected sections
- rsx: Only framebuffer resources are upscaled
- Do not resize regular blit engine resources
- Lazy initialize readback buffer when using opengl
-- These measures should help minimize vram usage
- Do not assume texture2D when creating new textures
- Flag invalid texture cache if readonly texture is trampled by fbo memory.
Avoids binding a stale handle to the pipeline and is rare enough that it should not hurt performance
- vulkan: Do not assume an aux frame context must exist in a well defined state as set in init_buffers() since the request might be external (via overlays path)
- gl: Do not bother waiting for idle before servicing external flip requests
- gl: Queue overlay cleanup requests to ensure only glthread attempts touching the context
- overlays: Do not compute size metrics for invalid/unsupported glyphs
- opengl driver optimization for nvidia. On nvidia glTextureBufferRange performance is horrendous
-- Initialize texture buffer to whole buffer at startup and use absolute offsets to read data instead
-- Over 2x performance in some cases (Resogun, TNT racers)
- gl/vk: Do not flip non-existent display buffers. Fixes spec violation at boot in TNT racers demo
- whitespace fixes for sys_rsx
- The scale offset matrix is fine but on real hardware the z results seem to be independent of near/far clipping distances
-- If depth falls within near/far, clamp depth value to [0,1]
- Force mipmap count to 1 if sampling from an RTV/DSV
- TODO: Better wcb flush detection, it should be better to re-upload the texture after it has been dwnloaded if expected mipmaps are > 1
- Handle aliased depth + color target by disabling depth writes. This looks to be the correct way
- Add support for generic passes that cannot be done using general imaging operations. Lays the framework for tons of features and effects
- Implement RGBA->D24D8 casting. Sometimes games will split depth texture into RGBA8 then use the new RGBA8 as a depth texture directly
-- This happens alot in ps3 games and I'm not sure why. Its likely the ps3 did not sample fp values with linear filtering so this is a workaround
-- Only implemented for openGL at the moment
-- Requires a workaround for an AMD driver bug
- In rare cases the driver derefs a nullptr and dies, taking the emulator with it
- From testing, it seems the vram is indeed freed when this happens so its "safe" to continue
- Remove generic throws from the rsx pipeline. Stops the rsx thread from silently dying leaving the emulator in a hung state
- Hackplement add_signed and reverse_subtract_signed blend modes
- Reimplement fragment program fetch and rewrite texture upload mechanism
-- All of these steps should only be done at most once per draw call
-- Eliminates continously checking the surface store for overlapping addresses as well
addenda - critical fixes
- gl: Bind TIU before starting texture operations as they will affect the currently bound texture
- vk: Reuse sampler objects if possible
- rsx: Support for depth resampling for depth textures obtained via blit engine
vk/rsx: Minor fixes
- Fix accidental imageview dereference when using WCB if texture memory occupies FB memory
- Invalidate dirty framebuffers (strict mode only)
- Normalize line endings because VS is dumb
to flush_all is up to date. Should prevent recursive exceptions
Partially revert Jarves' fix to invalidate cache on tile unbind. This will
need alot more work. Fixes hangs
- Identify minimize/restore events as separate from regular resize and do not react to them
- Enable message queue consumption after loading the shaders cache. Also hides the frame in this step
-- This fixes the 'start fullscreen' bug when running vulkan
- This allows for better handling of deferred flushes.
-- There's still no guarantee that cache contents will have changed between the set acquisition and following flush operation
-- Hopefully this is rare enough that it doesnt cause serious issues for now
- Refactor invalidate memory functions into one function
- Add cached object rebuilding functionality to avoid throwing away useful memory on an invalidate
- Added debug monitoring of texture unit VRAM usage
- Emulate primitive restart in software whenever we get the chance
- Ensure PRIMITIVE_RESTART is never active when LIST topologies are active
- Reimplement TRIANGLE_FAN, POLYGON and QUAD expansion
- This communication is important in communicating window events. Helps properly synchronize swapchain management on vulkan and stops nvidia crashing
- Do not block the message queue lest the driver detect window as not responding
rsx: Conditional lock hack removed
vulkan - Fixes
- Remove unused texture class
- Fix native pitch calculation (WCB)
rsx: Catch hanging begin/end pairs when flushing deferred draw calls
vulkan: Register DXT compressed formats
vulkan: Register depth formats
gl: Workaround for 'texture stitching' when gathering flip surface
- TODO: Add a proper flip hack option
rsx: Fix texture memory size calculation
- DXT textures dont have real pitch. Since pitch is used to calculate memory size, make sure it always evaluates to rsx_size
rsx: Fix cpu copy detection
rsx: Validate blit dst surface and dont make assumptions about region blit order
- Also relax restrictions on memory owned by the blit engine if strict rendering is not enabled
rsx: Fix depth texture detection
rsx: Do not manually offset into dst. The overlapped range check does so automatically
rsx: Minor optimizations
rsx: Minor fixes
- Fix to detect incompatible formats when using GPU texture scaling and show message
- Better 'is_depth_texture' algorithm to eliminate false positives
gl/vk/rsx: Refactoring; unify texture cache code
gl: Fixups
- Removes rsx::gl::texture class and leave gl::texture intact
- Simplify texture create and upload mechanisms
- Re-enable texture uploads with the new texture cache mechanism
rsx: texture cache - check if bit region fits into dst texture before attempting to copy
gl/vk: Cleanup
- Set initial texture layout to DST_OPTIMAL since it has no data in it anyway at the start
- Move structs outside of classes to avoid clutter
gl: Fix multidraw [WIP]
rsx: Ignore vertex base when data source is generated using arithmetic
vk: Check pending flag before doing fence poke
vk/gl: Fix for inlined array and immediate draws
rsx: Collapse joined draws when batching
rsx: Blit engine improvements
- Always handle blits to and from framebuffers through the GPU
- Handle depth surfaces properly when using GL
- Check for format mismatches when blitting to the surface store [WIP]
Fix rsx offscreen-render-to-display-buffer-blit surface reads
- Also, properly scale display output height if reading from compressed tile
gl: Fix broken dst height computation
- The extra padding is only there to force power-of-2 sizes and isnt used
gl: Ignore compression scaling if output is rendered to in a renderpass
rsx/gl/vk: Cleanup for GPU texture scaling. Initial impl [WIP]
- TODO: Refactor more shared code into RSX/common
rsx/vk/shaders_cache: Move vp control mask to dynamic state
rsx/vk/gl: adds a shader cache for GL. Also Separates pipeline storage for each backend
rsx: Add more texture state variables to the cache
- Significant gains from greatly reduced CPU work
- Also reorders command submission in end() to improve throughput
- Refactors most of the vertex buffer handling
- All vertex processing is moved GPU side
- Improvements to framebuffer usage; Avoid creating new resources every frame
- Handle null fragment program properly
- Collect vertex upload statistics
- vk: Pre-initialize 'unused' varying registers in the vertex shader in case it gets matched with a fs that consumes it
-- Fixes a crash about fog_c not being declared
gl/dx12/vk: Handle null fragment program
- cleanup - use yield semantic instead of sleep(0) as yield is more cross-platform
-- sleep(0) is a windows specific scheduler hint
- vk: Do not select first available format when choosing a swapchain format
- gl/vk: Ignore rendering zero sized framebuffers/scissors
- fp: Re-enable range clamp on fp16 registers; fix fx12 clamping [-2, 2]