- Removes the old depth scaling using an overlay.
It was never going to work properly due to per-pixel stencil writes being unavailable
- TODO: Preserve stencil buffer during ARGB8->D32S8 shader conversion pass
- Some applications (e.g Backbreaker) use an evil hack to resolve MSAA.
The application respecifies a formerly AA region as a region with no AA then performs a framebuffer feedback lookup.
The old memory keeps AA during read, but writes back to itself with AA resolved.
This is evil on several levels but it just happens to work on PS3
- The benefits of FIFO optimizations are huge in some cases.
The optimizations also do not break any tested applications so no need to disable with strict mode
- A debug option is provided to disable this behaviour for testing
- Improves cleanup code to consist of 2 parts, remove then dispose. Remove
does not deallocate the item until dispose is called on it, allowing the
backends to first deallocate external references.
- Caller is responsible for managing list locking and tracking disposable list
of items when external references have been cleaned up before using
dispose method.
1. rsx: Rework section synchronization using the new memory mirrors
2. rsx: Tweaks
- Simplify peeking into the current rsx::thread instance.
Use a simple rsx::get_current_renderer instead of asking fxm for the same
- Fix global rsx super memory shm block management
3. rsx: Improve memory validation. test_framebuffer() and
tag_framebuffer() are simplified due to mirror support
4. rsx: Only write back confirmed memory range to avoid overapproximation errors in blit engine
5. rsx: Explicitly mark clobbered flushable sections as dirty to have them
removed
6. rsx: Cumulative fixes
- Reimplement rsx::buffered_section management routines
- blit engine subsections are not hit-tested against confirmed/committed memory range
Not all applications are 'honest' about region bounds, making the real cpu range useless for blit ops
- Reimplements the AMD workaround using an identity buffer to avoid the performance hit of doing multiple glDrawArrays for every single compiled set
- Reimplements first/count allocation using a scratch buffer to reduce allocation overhead when large number of draw calls is used
- Improve dirty state tracking affecting program state
- vk: Refactor out transform constants upload into a separate channel to avoid if possible
transform data uploads are quite expensive
- Introduces a gpu program analyser step to examine shader contents before attempting compilation or cache search
- Avoids detecting shader as being different because of unused textures having state changes
- Adds better program size detection for vertex programs
- Improved vertex program decompiler
- Properly support CAL type instructions
- Support jumping over instructions marked with a termination marker with BRA/CAL class opcodes
- Fix SRC checks and abort
- Fix CC register initialization
- NOTE: Even unused SRC registers have to be valid (usually referencing in.POS)
- Readback does not work at all with float textures on AMD openGL
Driver throws a bogus OUT_OF_MEMORY error regardless of amount of VRAM and system RAM available
- vk: Clear dirty textures before copying 'old contents' in case the old data does not fill the new region
- rsx: Properly decode border color - seems to be in BGRA format
- vk: better approximation of border color to better choose between the presets
- vk: Individually clear color images outside render pass and without scissor
- vk: Fix renderpass selection for clear overlay pass
- vk: Include scissor region when emulating clear mask
NOTES:
- vk: Completely avoid using vkClearXXXXimage - its 'broken' on nvidia drivers
Spec is vague about the function so its not an actual bug
ClearAttachment is clearly defined as bypassing bound state which works correctly
- TODO: Implement memory sampling to simulate loading precleared memory if cell used memset to preinitialize the framebuffer
Autoclear depth to 1|255 and color to 0 is hacky!
- gl/vk: Fix subresource copy/blit
- gl/vk: Fix default_component_map reading
- vk: Reimplement cell readback path and improve software channel decoder
- Properly name the subresource layout field - its in blocks not bytes!
- Implement d24s8 upload from memory correctly
- Do not ignore DEPTH_FLOAT textures - they are depth textures and abide by the depth compare rules
- NOTE: Redirection of 16-bit textures is not implemented yet
Primary:
- Fix SET_SURFACE_CLEAR channel mask - it has been wrong for all these
years! Layout is RGBA not ARGB/BGRA like other registers
Other Fixes:
- vk: Implement subchannel clears using overla pass
- vk: Simplify and clean up state management
- gl: Fix nullptr deref in case of failed subresource copy
- vk/gl: Ignore float buffer clears as hardware seems to do
- Ignore unlocked blit sections [TODO]
- Do not attempt blit on hw if bytesize is unsupported
- gl: Implement typeless memory transfers
Uses pbo to handle type-agnostic memory transfer
- Mainly affected are colormasks and read swizzles
NOTES:
- Writes to G write to the second and fourth component (YW)
- Writes to B write to first and third component (XZ)
- This means the actual format layout is BGBG (RGBA) making RG mapping actually GR
- Clear does not seem to have any intended effect on this format (TLOU)
Fixes the following issues on Tales of Vesperia which requires SRM.
- Blacked out scene after the sleeping dog now renders correctly
- Ghosting effect. The ghosting was most noticeable as a delay between the character rendering and the cell shading around the character. This appears to be gone with this change.
- GL queries share the target binding (not asynchronous!)
- Discard active queries by closing them, leave closed queries alone (nothing to be done for discard op)
- Reimplements render target views used for sampling
- Optimizes access using an encoded control token
- Adds proper encoding for 24-bit textures (DRGB8 -> ORGB/OBGR)
- Adds proper encoding for ABGR textures (ABGR8 -> ARGB8)
- Silence some compiler warnings as well
- TODO: Real texture views for OGL current method is a hack
- Separate TXB from TXL: They are completely different!
- Properly perform TMU emulation in the fragment shader. Implemens SRGB conversion and alphakill at the moment
- Properly perform ROP emulation in the fragment shader. Implements FRAMEBUFFER_SRGB. While support on the chip looks to be incomplete (and wierd), it does work
- Document some more bits in SHADER_CONTROL register