gl/vk: Bump shader cache version
gl/vk: Disable anisotropic override when strict mode enabled as it is proven to alter some games negatively
gl: Clamp buffer view range to not exceed the backing buffer size. Also add assert for the same condition
- Track asynchronous operations in RSX core
- Add read barriers to force pending writes to finish.
Fixes zcull delay flicker in all UE3 titles without forcing hard stall
- Increase zcull latency as all writes should be synchronized now
- ZCULL unit emulation rewritten
- ZCULL reports are now deferred avoiding pipeline stalls
- Minor optimizations; replaced std::mutex with shared_mutex where contention is rare
- Silence unnecessary error message
- Small improvement to out of memory handling for vulkan and slightly bump vertex buffer heap
- Removes the duplicate local_transform_constants
- Resets the transform constants on every context reset
- Simplifies the code abit which should make it faster
- NOTE: Transform constants are persistent across context re-init events (VF5)
- TODO: Alot of work is still needed to execute draw commands out of order
Thats the only solution to games sending many draw calls with high frequency of state changes
- Adds support for compilation on MAC with moltenVK. Note that vulkan does
not work on MacOS yet. There are two main blockers:-
1) Texture component swizzles are not supported except for
RGBA8_UNORM->BGRA8_UNORM.
2) There is a bug in their SPIR-V -> MSL generator.
GLSL.std.450.xxxx functions are not implemented which breaks rpcs3
functionality. Trying to compile a vertex shader will throw because
unpackHalf2x16 is missing.
- According to NV_fragment_program spec, registers are zero initialized always
- A program even without writing to these registers will have black (0, 0, 0, 0) output
Confirmed behaviour with MotorStorm games. Their engine uses this quirk to clear color buffers when doing depth replace
Might be an unfixed game bug
According to the NV_fragment_program spec, its not feasible to have 16-bit depth wries
NOTE: NV_fragement_program precedes NV_fragment_program2 which is very
close to what RSX consumes. It is hardware from that era afterall
- Forces Bitcast of texture data if input format cannot possibly be the
same as the existing texture format
- rsx: Other minor improvements to texture cache :-
- remove obsolete blit engine incompatibility warning. The texture will be re-uploaded if it is indeed incompatible
- Implement warn_once and err_once to avoid spamming the log with systemic errors
- Track mispredicted flushes
- Reswizzle bitcasted texture data to native layout
TODO: Also needs reshuffle according to input remap vector
- A new step is added between decompilation and pipeline object creation allowing for properties to be updated based on shader contents
- Allos masking off attachment writes that are unmodified in the shader
- gl: Do not call makeCurrent every flip - it is already called in set_current()
- gl: Improve ring buffer behaviour; use sliding window to view buffers larger than maximum viewable hardware range
NV hardware can only view 128M at a time
- gl/vk: Bump transform constant heap size When lots of draw calls are issued, the heap is exhaused very fast (8k per draw)
- gl: Remove CLIENT_STORAGE_BIT from ring buffers. Performance is marginally better without this flag (at least on windows)
- Identify depth textures reaching the gpu via shader_read upload path
- Use correct timestamp counter for opengl
- inline draw_state::test_property because msvc doesnt do it for us
- Do not bother rechecking the dirty sampler pool for hits. Its faster to create new sampler than to search the pool
- Reserve some memory on vertex layout struct to reduce reallocation penalty
* Improve debugger
* Added 'Step Over' functionality
* Added special SPU pause functionality that pauses the SPU thread when the tag mask is at 0x80000000 by holding ctrl while pausing
* Go to address dialog now evaluates expressions, including defined variables such as pc, r1, r2, etc
* Requires QtScript to be linked with the project
* Made the option to center shown addresses (Go to addr/pc) optional by making it an entry in the GUI ini config
* Shown addresses now appear 'selected'
* New keyboard shortcuts!
- Ctrl+G -> Go to address
- F10 -> Step Over
- F11 -> Step (Into)
* [sysutil] Add Magnetometer system param
* [ui] Add UI for Move handler
Current options are "Null" and "Fake".
* cellGem: Improvements
* cellCamera: Improvements
Use atomic variable to sync TTY size
Implement console_putc (liblv2)
Write plaintext instead of HTML
Slightly improve performance
Fix random line breaks in TTY
* Made GDB debugger working with IDA
* Added async interrupts support
* Report proper thread after pausing
* Support attaching debugger before running app
* travis hotfix
* expose env vars for tag, hash and commit number
* bump version
* also update av version string
* remove hash from av version for master builds
* change hash encoding back to ascii
- use normal selection instead of doubleclick
- move SaveSettings out of the tabs to reduce file access
- translate EmptyPath as well
- some other minor refactors to reduce lines of code
- Adds support for abstract implementations
- Adds native windowing implementations for WIN32 and X11 as fallbacks
when present support is lacking (headless configs)
- For some reason the hardware forgets that primitive restart is enabled and tries to actually read vertex index 65535
- Works correctly if uint32 vertex indices are used instead of uint16 for cases where primitive restart is active
- Fix for texture barriers
- vulkan: Rework texture cache handling of depth surfaces
- Support for scaled depth blit using overlay pass
- Support proper readback of D24S8 in both D32F_S8 and D24U_S8 variants
- Optimize the depth conversion routines with SSE
- vulkan: Replace slow single element copy with std::memcpy
- Check heap status before attempting blit operations
- Bump guard size on upload buffer as well
- Implement flush-always behaviour to partially fix readback from a currently bound fbo
- Without this, only the first read is correct, as more draws are added the results become 'wrong'
- Fixes WCB and cpublit behviour
- Synchronize blit_dst surfaces to avoid data loss when gpu texture scaling is used
- Its still faster in such cases to disable gpu texture scaling but some types cannot be disabled without force cpu blit (e.g framebuffer transfers)
- Memory management tuning
- rsx: on-demand texture cache rescanning for unprotected sections
- rsx: Only framebuffer resources are upscaled
- Do not resize regular blit engine resources
- Lazy initialize readback buffer when using opengl
-- These measures should help minimize vram usage
Six instructions changed to use xmm registers instead of gpr.
ROTQBII, ROTQMBII, SHLQBII look better (shifts by imm)
ROTQBI, ROTQMBI, SHLQBI changed for consistency (shifts by variable)
Only AVX-512 path is changed (third version).
This instruction is extremely rare.
And the code is probably not optimal.
So this commit is pretty useless.
* Fix gs_frame spawning on a screen other than the one the RPCS3 window is on for multi-monitor setups
* Cleaned up code & refactored it into a utility function for reuse
* Qt: take gs_frame's framemargins into account by using showEvent
- For some surfaces, dimensions are passed via the log2 bits rather than surface pitch
-- This is similar to the setup for nv406e and probably means the surfaces are padded and swizzled
- Do not assume texture2D when creating new textures
- Flag invalid texture cache if readonly texture is trampled by fbo memory.
Avoids binding a stale handle to the pipeline and is rare enough that it should not hurt performance
- Do not add unused subroutines in shaders unless necessary
-- makes shaders easier to read and disassembled spir-v has less clutter
- glsl: Replace switch block with lookup table
- vulkan: Do not assume an aux frame context must exist in a well defined state as set in init_buffers() since the request might be external (via overlays path)
- gl: Do not bother waiting for idle before servicing external flip requests
- gl: Queue overlay cleanup requests to ensure only glthread attempts touching the context
- overlays: Do not compute size metrics for invalid/unsupported glyphs
- opengl driver optimization for nvidia. On nvidia glTextureBufferRange performance is horrendous
-- Initialize texture buffer to whole buffer at startup and use absolute offsets to read data instead
-- Over 2x performance in some cases (Resogun, TNT racers)
- gl/vk: Do not flip non-existent display buffers. Fixes spec violation at boot in TNT racers demo
- whitespace fixes for sys_rsx
- Implement low bit decode override flags for 2-component textures
- Properly implement alot of texture remaps according to the autotest results
rsx: Do not unnecessarily shuffle WZYX->RGBA unless we have proof
- From looking at format swizzles, this is incorrect
- Use edges of depth range to map clamped stuff
Disable range compression on regular draws vs extended range draws
- Some applications require full 0-1 usage without compromises.
-- TODO: This leaves the extended range z values to fight with regular draws in the .99 - 1.0 range
- The scale offset matrix is fine but on real hardware the z results seem to be independent of near/far clipping distances
-- If depth falls within near/far, clamp depth value to [0,1]
- Always flush the primary queue and wait if not involking readback from rsx thread
-- Should fix some instances of device_lost when using WCB
-- Marked remaining case as TODO
-- TODO: optimize amount of time rsx waits for external threads trying to read
fix input regression and fix input for FIFA games
fix input in NASCAR [BLUS30932]
fix port status query -> disconnected devices don't cripple following devices by decreased now_connect
- Track last working state and reset to it if RSX starts to desync
-- This is especially useful when running vulkan since the renderer will easily outpace the rest of the system when merely recording draw commands
- Ignore empty sets
-- Mark empty/invalid IB sets as having 0 element counts.
- Doesn't account for dynamic libraries loaded after the fact,
but usually good enough since
1) Those aren't even present in some games
2) They usually only have about 1 or 2 fragments (dialogs) each.
- Make all texture access on non-existent textures return 0
- If border color is closer to 0, then set alpha to 0 as well (might break some corner cases with alpha test)
- Zero initialize null sampler
- Force mipmap count to 1 if sampling from an RTV/DSV
- TODO: Better wcb flush detection, it should be better to re-upload the texture after it has been dwnloaded if expected mipmaps are > 1
- Sometimes square renders are done to surfaces with pitch=64 and re-uploaded with swizzle scanning
-- This setup avoids discarding targets if they are square and pitch == 64
- Abort nv406e semaphore acquire if the rsx thread stalls/crashes
- Fix texture size approximation to take mipmaps into account. Fixes some games hanging with WCB