- Forcefully downloads and reuploads data from the CPU in case of unexpected overlaps
- Properly detect correct size of newly created blit targets
- Remember to clear any existing views when changing the default component map!
- These dependencies have #defines which enable code related to them
in rpcs3 and rpcs3_ui targets. They are only used in rpcs3_emu but
HAVE_* defines have to be defined in rpcs3 and rpcs3_ui targets also,
so they have to have PUBLIC visibility so defines carried over
CMake: Fix Alsa and Pulse audios being disabled
- HAVE_PULSE and HAVE_ALSA were not defined in rpcs3 target
Fixup libevdev
* CMake: Refactor build to multiple libraries
- Refactor CMake build system by creating separate libraries for
different components
- Create interface libraries for most dependencies and add 3rdparty::*
ALIAS targets for ease of use and use them to try specifying correct
dependencies for each target
- Prefer 3rdparty:: ALIAS when linking dependencies
- Exclude xxHash subdirectory from ALL build target
- Add USE_SYSTEM_ZLIB option to select between using included ZLib and
the ZLib in CMake search path
* Add cstring include to Log.cpp
* CMake: Add 3rdparty::glew interface target
* Add Visual Studio CMakeSettings.json to gitignore
* CMake: Move building and finding LLVM to 3rdparty/llvm.cmake script
- LLVM is now built under 3rdparty/ directory in the binary directory
* CMake: Move finding Qt5 to 3rdparty/qt5.cmake script
- Script has to be included in rpcs3/CMakeLists.txt because it defines
Qt5::moc target which isn't available in that folder if it is
included in 3rdparty directory
- Set AUTOMOC and AUTOUIC properties for targets requiring them (rpcs3
and rpcs3_ui) instead of setting CMAKE_AUTOMOC and CMAKE_AUTOUIC so
those properties are not defined for all targets under rpcs3 dir
* CMake: Remove redundant code from rpcs3/CMakeLists.txt
* CMake: Add BUILD_LLVM_SUBMODULE option instead of hardcoded check
- Add BUILD_LLVM_SUBMODULE option (defaults to ON) to allow controlling
usage of the LLVM submodule.
- Move option definitions to root CMakeLists
* CMake: Remove separate Emu subtargets
- Based on discussion in pull request #5032, I decided to combine
subtargets under Emu folder back to a single rpcs3_emu target
* CMake: Remove utilities, loader and crypto targets: merge them to Emu
- Removed separate targets and merged them into rpcs3_emu target as
recommended in pull request (#5032) conversations. Separating targets
probably later in a separate pull request
* Fix relative includes in pad_thread.cpp
* Fix Travis-CI cloning all submodules needlessly
Preprocess . and .. correctly
Don't use recursive locking
Also use std::string_view
Fix format system for std::string and std::string_view
Fix fmt::merge for std::string_view
- NOTE: The address swizzle index is only for use as src. The address registers are only used one channel at a time.
- When the destination of ARL, the encoding is the same as the other temp registers
- Retag resources reprotected under flush_always rules
- Properly check for blit resource fitting taking into account format
mismatch, pitch mismatch and typeless transfers
Remove vm::ptr operator %
This was a bad idea but explicit_bool_t was created almost for it
Other removed types are unused and have little to no meaning
- The x value contains the VP output value interpolated across primitive surface
- The y coordinate contains the fog fraction according to the selected fog formula
Remove "atomic operator" classes
Remove test, test_and_set, test_and_reset, test_and_complement global functions
Simplify atomic_t<> with constexpr if, remove some garbage
Redesign bs_t<> to use class, mark its methods constexpr
Implement atomic_bs_t<> for optimizations
Remove unused __bitwise_ops concept (should be in other header anyway)
Bitsets can now be tested via safe bool conversion
cellMicEnd would get stuck waiting for the cellMic thread to finish, but
it never does because micInited is still true.
Fixes Resistance: Fall of Mankind getting stuck after the menu
- Newer nvidia drivers are not exposing IMMEDIATE present mode unless you change options in nvidia control panel
This can cause severe performance degradation unless the vsync option is set to "off" in control panel
- Some games overflow the program buffer e.g Resistance games
The observed overflow is one instruction longer, likely an engine bug
with counting instructions
- Tags framebuffer resources on first use (when on_write is called to verify memory)
- Texture cache now selects the best match and even sorts atlas writes with memory write order to avoid older data showing over newer one
- Some formats are proven to ignore swizzle flag
- DXT compressed textures
- COMPRESSED_BG_GB class textures
- Some applications are using swizzled wide integer formats so those are confirmed to swizzle
- Use proper time checking; depending on what is being done one 'tick' can
be almost a millisecond long or several nanoseconds
- Avoid spamming the system timer unless necessary
- A possible deadlock is still present if rsx is trying to get a super_ptr whilst the vm lock holder is in an access violation
This patch makes this scenario very unlikely since each block need only be touched once
- Implement forced reading when calling update method to sync partial lists
- Defer conditional render evaluation and use a read barrier to avoid extra work
- Fix HLE gcm library when binding tiles & zcull RAM
- Do not do a full sync on a texture read barrier
- Avoid calling zcull sync in FIFO spin wait
- Do not flush memory to cache from the renderer side; this method is now obsolete
invalidate_range_impl_base does not mark all textures that will only be
unprotected as dirty when doing a deferred flush, since that is done by
flush_all.
However, if there are no sections to flush, the deferred flush will
use the same code path as non-deferred flushes for unprotecting textures
and forget to mark them as dirty.
This commit fixes this bug.
The existing implementation restarts the loop immediately after
finding a range_data instance that updates the trampled_range.
This commit refactors this method to continue the loop with the updated
trampled_range, and then repeat only those range_data instances that
were iterated through before the trampled_range was last updated.
As a result, the number of total iterations required is reduced.
When the trampled range changes, get_intersecting_set restarts the
outer loop. However, due to an off-by-one error, it skips the first
cache entry when doing so. This can cause a texture not to be
correctly unlocked, which could lead to issues or even deadlocks.
This commit fixes this off-by-one error.
If "address" is not page-aligned, the calculated end address for the
corresponding page is incorrect which can lead to false positives,
causing some textures to be re-uploaded unnecessarily.
This commit fixes this typo.
- Defer compilation process to worker threads
- vulkan: Fixup for graphics_pipeline_state.
Never use struct assignment operator on vk** structs due to padding after sType member (4 bytes)
- Adds proper support for vertex textures, including dimensions other than 2D textures
- Minor analyser fixup, removes spurious 'analyser failed' errors
- Minor optimizations for program state tracking
- Adds dead code elimination
- Fix absolute branch target addresses to take base address into account
- Patch branch targets relative to base address to improve hash matching
- Bumps shader cache version
- Enables shader logging option to write out vertex program binary,
helpful when debugging problems.
- Fix program abort logic to never abort before resolving later label addresses
Fixes jumping over broken code and jumping over END markers
- TEXTURE_CONTROL2 has indexing range of [0..15] without stride skipping!
This register does not have interleaving with other texture registers
- Track shader address poke as it seems to invalidate programs as well
- Avoid re-locking memory if there is no reason to do so (no draws issued)
- Actively bound regions should always get written to the backing cache
- Forcefully read memory during download if writes to the target have occured since last sync event
- Unroll main compute queue loop
- Do NOT run GPU cores on mappable memory! This has a dreadful impact on performance for obvious reasons
- Enable dynamic SSBO indexing (affects AMD)
- Make loop unrolling and loop length variable depending on hardware and find optimum
Build SPU cache after PPU, fix mixing progress
SPU ASMJIT: add support for Giga mode
SPU ASMJIT: use the same spu.log location as SPU LLVM
SPU: improve spu.log disasm
SPU: improve trampolines, unify with SPU ASMJIT
SPU: decode interrupt handler address from BR/BRA at 0x0
SPU LLVM: support Mega/Giga modes
SPU LLVM: implement function chunks
SPU LLVM: use PHI nodes, value visibility across basic blocks
SPU LLVM: implement function chunk table
New simple memory manager for LLVM (bugfix)
- Region pitch of 64 (disabled) can be used to indicate packed contents - do not assume it is the actual pitch!
- Also fixes interaction of AA factors with lockable_region size
- Use names for overlay command config and vertex data instead of std::pair.
- Make a couple of compiled_resource constructors explicitly named functions.
- Used to transfer D32S8 data where it makes sense to use this variant
- On nvidia cards, it is very slow to move aspects from D24S8 probably due to the format being faked.
For this reason, the unsafe variant is used for both D16 and D24S8 to avoid the heavy performance loss
- RADV does not keep a mapping ptr around for subsequent remap and falls back to heavy amdgpu methods every time
Explicitly manage pointer in the ring buffer structure to fix this
- Compute is now used to assist in some parts of blit operations, since there are no format conversions with vulkan like OGL does
- TODO: Integrate this into all types of GPU memory conversion operations instead of downloading to CPU then converting
- Removes the old depth scaling using an overlay.
It was never going to work properly due to per-pixel stencil writes being unavailable
- TODO: Preserve stencil buffer during ARGB8->D32S8 shader conversion pass
- Some applications (e.g Backbreaker) use an evil hack to resolve MSAA.
The application respecifies a formerly AA region as a region with no AA then performs a framebuffer feedback lookup.
The old memory keeps AA during read, but writes back to itself with AA resolved.
This is evil on several levels but it just happens to work on PS3
Returning CELL_OK in sceNpManagerRequestTicket2 makes NPEB01268 loop indefinitely trying to check the downloaded content.
Telling that the system is offline escapes the loop and make the game go further.
Moves NPEB01268/BLES01794 from Intro to Ingame.
Disable extra modes for SPU LLVM for now.
In Mega mode, SPU Analyser tries to determine complete functions.
Recompiler tries to speed up returns via 'stack mirror'.
* Support for using arbitrary firmware fonts
* Support for specifying font extension in `font` constructor (useful for most firmware fonts that use .TTF)
- The benefits of FIFO optimizations are huge in some cases.
The optimizations also do not break any tested applications so no need to disable with strict mode
- A debug option is provided to disable this behaviour for testing
- Matches ps3 accuracy for all tested values with few exceptions
- Do not enter the host OS kernel if waiting for less than 500us to avoid scheduler issues
- Improves cleanup code to consist of 2 parts, remove then dispose. Remove
does not deallocate the item until dispose is called on it, allowing the
backends to first deallocate external references.
- Caller is responsible for managing list locking and tracking disposable list
of items when external references have been cleaned up before using
dispose method.
1. rsx: Rework section synchronization using the new memory mirrors
2. rsx: Tweaks
- Simplify peeking into the current rsx::thread instance.
Use a simple rsx::get_current_renderer instead of asking fxm for the same
- Fix global rsx super memory shm block management
3. rsx: Improve memory validation. test_framebuffer() and
tag_framebuffer() are simplified due to mirror support
4. rsx: Only write back confirmed memory range to avoid overapproximation errors in blit engine
5. rsx: Explicitly mark clobbered flushable sections as dirty to have them
removed
6. rsx: Cumulative fixes
- Reimplement rsx::buffered_section management routines
- blit engine subsections are not hit-tested against confirmed/committed memory range
Not all applications are 'honest' about region bounds, making the real cpu range useless for blit ops
Drop _xbegin family intrinsics due to bad codegen
Implemented `notifier` class, replacing vm::notify
Minor optimization: detach transactions from global mutex on TSX path
Minor optimization: don't acquire vm::passive_lock on PPU on TSX path
- Properly identify puller spin primitives
- Add a small wake delay after exiting a spin delay. Fixes desynchronization
It seems real hw has a small delay between cell edits to commandbuffer memory at the GET address and the changes becoming visible to the DMA puller
Simulated with a short busy_wait, large values will improve sync but degrade performance
- Reimplements the AMD workaround using an identity buffer to avoid the performance hit of doing multiple glDrawArrays for every single compiled set
- Reimplements first/count allocation using a scratch buffer to reduce allocation overhead when large number of draw calls is used
- Improve dirty state tracking affecting program state
- vk: Refactor out transform constants upload into a separate channel to avoid if possible
transform data uploads are quite expensive
- Introduces a gpu program analyser step to examine shader contents before attempting compilation or cache search
- Avoids detecting shader as being different because of unused textures having state changes
- Adds better program size detection for vertex programs
- Improved vertex program decompiler
- Properly support CAL type instructions
- Support jumping over instructions marked with a termination marker with BRA/CAL class opcodes
- Fix SRC checks and abort
- Fix CC register initialization
- NOTE: Even unused SRC registers have to be valid (usually referencing in.POS)
Allow indirect calls within current function using a jumptable
This restores some functionality removed in SPU ASMJIT 2.0
Change SPUThread::get_ch_value prototype
Use conditional memory access to invalid address.
This approach can allow continue (for debugging);
but at the same time it doesn't add function call to recompiled code.
- Readback does not work at all with float textures on AMD openGL
Driver throws a bogus OUT_OF_MEMORY error regardless of amount of VRAM and system RAM available
- Formats support is linked to the physical device and by extension the logical device derived from it
It therefore makes no sense to track this as a separate object.
Simplifies parameter passing and template specialization.
Also avoids corner cases with AMD hardware (where D24S8 is not supported)
- vk: Clear dirty textures before copying 'old contents' in case the old data does not fill the new region
- rsx: Properly decode border color - seems to be in BGRA format
- vk: better approximation of border color to better choose between the presets
- vk: Individually clear color images outside render pass and without scissor
- vk: Fix renderpass selection for clear overlay pass
- vk: Include scissor region when emulating clear mask
NOTES:
- vk: Completely avoid using vkClearXXXXimage - its 'broken' on nvidia drivers
Spec is vague about the function so its not an actual bug
ClearAttachment is clearly defined as bypassing bound state which works correctly
- TODO: Implement memory sampling to simulate loading precleared memory if cell used memset to preinitialize the framebuffer
Autoclear depth to 1|255 and color to 0 is hacky!
- gl/vk: Fix subresource copy/blit
- gl/vk: Fix default_component_map reading
- vk: Reimplement cell readback path and improve software channel decoder
- Properly name the subresource layout field - its in blocks not bytes!
- Implement d24s8 upload from memory correctly
- Do not ignore DEPTH_FLOAT textures - they are depth textures and abide by the depth compare rules
- NOTE: Redirection of 16-bit textures is not implemented yet
Primary:
- Fix SET_SURFACE_CLEAR channel mask - it has been wrong for all these
years! Layout is RGBA not ARGB/BGRA like other registers
Other Fixes:
- vk: Implement subchannel clears using overla pass
- vk: Simplify and clean up state management
- gl: Fix nullptr deref in case of failed subresource copy
- vk/gl: Ignore float buffer clears as hardware seems to do
- Ignore unlocked blit sections [TODO]
- Do not attempt blit on hw if bytesize is unsupported
- gl: Implement typeless memory transfers
Uses pbo to handle type-agnostic memory transfer
- Mainly affected are colormasks and read swizzles
NOTES:
- Writes to G write to the second and fourth component (YW)
- Writes to B write to first and third component (XZ)
- This means the actual format layout is BGBG (RGBA) making RG mapping actually GR
- Clear does not seem to have any intended effect on this format (TLOU)
Use opt-out shared spu_runtime to save memory (Option: SPU Shared Runtime)
Implement "übertrampolines" for dispatching compiled blocks
Patch fixed branch points to use trampolines after check failure
- Properly detect inline array registers vs constant value registers
- Silence needless spam, 306E is 2D surface engiine, the assumption that y is multiplied by 306E pitch is not crazy
Bug fix: don't display new data entry when not asked for
Use icon/title provided by the game for the new data entry
Display new data entry at the beginning of list when necessary
Minor cellSaveData cleanup
Fixes the following issues on Tales of Vesperia which requires SRM.
- Blacked out scene after the sleeping dog now renders correctly
- Ghosting effect. The ghosting was most noticeable as a delay between the character rendering and the cell shading around the character. This appears to be gone with this change.
- GL queries share the target binding (not asynchronous!)
- Discard active queries by closing them, leave closed queries alone (nothing to be done for discard op)
- Reimplements render target views used for sampling
- Optimizes access using an encoded control token
- Adds proper encoding for 24-bit textures (DRGB8 -> ORGB/OBGR)
- Adds proper encoding for ABGR textures (ABGR8 -> ARGB8)
- Silence some compiler warnings as well
- TODO: Real texture views for OGL current method is a hack
- Separate TXB from TXL: They are completely different!
- Properly perform TMU emulation in the fragment shader. Implemens SRGB conversion and alphakill at the moment
- Properly perform ROP emulation in the fragment shader. Implements FRAMEBUFFER_SRGB. While support on the chip looks to be incomplete (and wierd), it does work
- Document some more bits in SHADER_CONTROL register
- Export some debug information in the free texture register space components zw
Very useful when analysing renderdoc captures
- Enable shadow comparison on depth as long as compare function is active and texture is uploaded for depth read
Some engines (UE3) read all the components in the shader and use mul/mad with the result
it's currently unknown whats the exact relationship between the prio and the group type SYS_SPU_THREAD_GROUP_TYPE_COOPERATE_WITH_SYSTEM (0x20).
tho we do know prio'es whom less than 16 are reserved for the system.
gl/vk: Bump shader cache version
gl/vk: Disable anisotropic override when strict mode enabled as it is proven to alter some games negatively
gl: Clamp buffer view range to not exceed the backing buffer size. Also add assert for the same condition
- Track asynchronous operations in RSX core
- Add read barriers to force pending writes to finish.
Fixes zcull delay flicker in all UE3 titles without forcing hard stall
- Increase zcull latency as all writes should be synchronized now
- ZCULL unit emulation rewritten
- ZCULL reports are now deferred avoiding pipeline stalls
- Minor optimizations; replaced std::mutex with shared_mutex where contention is rare
- Silence unnecessary error message
- Small improvement to out of memory handling for vulkan and slightly bump vertex buffer heap
- Removes the duplicate local_transform_constants
- Resets the transform constants on every context reset
- Simplifies the code abit which should make it faster
- NOTE: Transform constants are persistent across context re-init events (VF5)