Debug build with clang-13 fails with "undefined references" to the
static const members in spu_channel class. This patch replaces the const
definitions with constexpr constants.
* Update asmjit dependency (aarch64 branch)
* Disable USE_DISCORD_RPC by default
* Dump some JIT objects in rpcs3 cache dir
* Add SIGILL handler for all platforms
* Fix resetting zeroing denormals in thread pool
* Refactor most v128:: utils into global gv_** functions
* Refactor PPU interpreter (incomplete), remove "precise"
* - Instruction specializations with multiple accuracy flags
* - Adjust calling convention for speed
* - Removed precise/fast setting, replaced with static
* - Started refactoring interpreters for building at runtime JIT
* (I got tired of poor compiler optimizations)
* - Expose some accuracy settings (SAT, NJ, VNAN, FPCC)
* - Add exec_bytes PPU thread variable (akin to cycle count)
* PPU LLVM: fix VCTUXS+VCTSXS instruction NaN results
* SPU interpreter: remove "precise" for now (extremely non-portable)
* - As with PPU, settings changed to static/dynamic for interpreters.
* - Precise options will be implemented later
* Fix termination after fatal error dialog
* Use atomic waitables instead instead of global thread wait as often as possible.
* Add ::is_stopped() and and ::is_paued() which can be used in atomic loops and with atomic wait. (constexpr cpu flags test functions)
* Fix notification bug of sys_spu_thread_group_exit/terminate. (old bug, enhanced by #9117)
* Function time statistics at Emu.Stop() restored. (instead of current "X syscall failed with 0x00000000 : 0")
* SPU Debugger: Fix try_get_insert_mask_info
* Debugger: Always update thread state on context's data change
No longer needing to press on thread's instructions for actions to work!
Allocate "personal" range lock variable for each spu_thread.
Switch from reservation_lock to range lock for all stores.
Detect actual memory mirrors in shareable cache setup logic.
* sys_spu: Fix race in sys_spu_thread_group_destroy and other minor fixes
* SPU: Wait for all threads to have error codes if exited by sys_spu_thread_exit
On last thread in group to run.
* sys_spu: Fix sys_spu_thread_group_start
* fixup ad fix sys_spu_thread_group_terminate
idk why "- !group->running" was put in the first place but its probably no longer relevant due to other changes and was causing other issues such as not always waiting for last SPU thread to set group state to INITIALIZED.
* Removed wrong code in sys_spu_thread_group_terminate.
* SPU Thread ID is accurate, including 5th thread id "rule".
* Fixed possible use-after-free access of spu_thread::group member.
* RawSPU ID management simplified.
vm::spu max address was overflowing resulting in issues, so cast to u64 where needed. Fixes#6145.
Use vm::get_addr instead of manually substructing vm::base(0) from pointer in texture cache code.
Prefer std::atomic_thread_fence over _mm_?fence(), adjust usage to be more correct.
Used sequantially consistent ordering in semaphore_release for TSX path as well.
Improved memory ordering for sys_rsx_context_iounmap/map.
Fixed sync bugs in HLE gcm because of not using atomic instructions.
Use release memory barrier in lwsync for PPU LLVM, according to this xbox360 programming guide lwsync is a hw release memory barrier.
Also use release barrier where lwsync was originally used in liblv2 sys_lwmutex and cellSync.
Use acquire barrier for isync instruction, see https://devblogs.microsoft.com/oldnewthing/20180814-00/?p=99485
- Multiple header files where missing #includes to other headers that
where used in the header. Correct header was included in correct
order in source files which caused everything to compile.
- Added missing #includes so header files correctly include all their
dependencies and fixes problems with IDEs being unable to parse
headers correctly due to missing symbols
Misc: fix EH frame registration (LLVM, non-Windows).
Misc: constant-folding bitcast (cpu_translator).
Misc: add syntax for LLVM arrays (cpu_translator).
Misc: use function names for proper linkage (SPU LLVM).
Changed function search and verification in Giga mode.
Basic stack frame layout analysis.
Function detection in Giga mode.
Basic use of new information in SPU LLVM.
Fixed jump table compilation in SPU LLVM.
Disable broken optimization in Accurate xfloat mode.
Make compiled SPU modules position-independent in SPU LLVM.
Optimizations include but not limited to:
* Compiling SPU functions as native functions when eligible
* Avoiding register context write-out
* Aligned stack assumption (CWD alike instruction)
* Fix SPU LR event setting in atomic commands according to hw test
* MFC: increment timestamp for PUT cmd in non-tsx path
* MFC: fix reservation lost test on non-tsx path in regard to the lock bit
* Reservation notification moved out of writer_lock scope to reduce its lifetime
* Use passive_lock/unlock in ppu atomic inctrustions to reduce redundancy
* Lock only once for dma transfers (non-TSX)
* Don't use RDTSC in reservation update logic
* Remove MFC cmd args passing to process_mfc_cmd
* Reorder check_state cpu_flag::memory check for faster unlocking
* Specialization for 128-byte data copy in SPU dma transfers
* Implement memory range locks and isolate PPU and SPU passive lock logic
Move lambda into a cpu_stop()
Use running thread counter to synchronize with sys_spu_thread_group_join()
Use SPU_STATUS_STOPPED_BY_STOP exclusively for sys_spu_thread_exit() as before
Remove unnecessary waiting in sys_spu_thread_group_exit()
Rollback some minor unnecessary changes
Use shared_mutex in SPU TG
Disable extra modes for SPU LLVM for now.
In Mega mode, SPU Analyser tries to determine complete functions.
Recompiler tries to speed up returns via 'stack mirror'.
Allow indirect calls within current function using a jumptable
This restores some functionality removed in SPU ASMJIT 2.0
Change SPUThread::get_ch_value prototype
Use opt-out shared spu_runtime to save memory (Option: SPU Shared Runtime)
Implement "übertrampolines" for dispatching compiled blocks
Patch fixed branch points to use trampolines after check failure