Commit graph

416 commits

Author SHA1 Message Date
Ivan c2190f71ca
SPU/PPU LLVM: fix triple setup (regression fix) (#12228) 2022-06-14 18:13:43 +03:00
Jeff Guo cefc37a553
PPU LLVM arm64+macOS port (#12115)
* BufferUtils: use naive function pointer on Apple arm64

Use naive function pointer on Apple arm64 because ASLR breaks asmjit.
See BufferUtils.cpp comment for explanation on why this happens and how
to fix if you want to use asmjit.

* build-macos: fix source maps for Mac

Tell Qt not to strip debug symbols when we're in debug or relwithdebinfo
modes.

* LLVM PPU: fix aarch64 on macOS

Force MachO on macOS to fix LLVM being unable to patch relocations
during codegen. Adds Aarch64 NEON intrinsics for x86 intrinsics used by
PPUTranslator/Recompiler.

* virtual memory: use 16k pages on aarch64 macOS

Temporary hack to get things working by using 16k pages instead of 4k
pages in VM emulation.

* PPU/SPU: fix NEON intrinsics and compilation for arm64 macOS

Fixes some intrinsics usage and patches usages of asmjit to properly
emit absolute jmps so ASLR doesn't cause out of bounds rel jumps. Also
patches the SPU recompiler to properly work on arm64 by telling LLVM to
target arm64.

* virtual memory: fix W^X toggles on macOS aarch64

Fixes W^X on macOS aarch64 by setting all JIT mmap'd regions to default
to RW mode. For both SPU and PPU execution threads, when initialization
finishes we toggle to RX mode. This exploits Apple's per-thread setting
for RW/RX to let us be technically compliant with the OS's W^X
    enforcement while not needing to actually separate the memory
    allocated for code/data.

* PPU: implement aarch64 specific functions

Implements ppu_gateway for arm64 and patches LLVM initialization to use
the correct triple. Adds some fixes for macOS W^X JIT restrictions when
entering/exiting JITed code.

* PPU: Mark rpcs3 calls as non-tail

Strictly speaking, rpcs3 JIT -> C++ calls are not tail calls. If you
call a function inside e.g. an L2 syscall, it will clobber LR on arm64
and subtly break returns in emulated code. Only JIT -> JIT "calls"
should be tail.

* macOS/arm64: compatibility fixes

* vm: patch virtual memory for arm64 macOS

Tag mmap calls with MAP_JIT to allow W^X on macOS. Fix mmap calls to
existing mmap'd addresses that were tagged with MAP_JIT on macOS. Fix
memory unmapping on 16K page machines with a hack to mark "unmapped"
pages as RW.

* PPU: remove wrong comment

* PPU: fix a merge regression

* vm: remove 16k page hacks

* PPU: formatting fixes

* PPU: fix arm64 null function assembly

* ppu: clean up arch-specific instructions
2022-06-14 15:28:38 +03:00
Nekotekina cb2c0733e2 SPU LLVM: fix vrangeps usage in clamp_smax 2022-06-12 16:40:04 +02:00
Malcolm Jestadt ebeeafc94f SPU LLVM: Use vrangeps in clamp_smax
- This instruction can clamp a value between a range of values, something which previously needed 2 instructions.
- With the immediate byte set to 0x2 it will compute the minimum between the absolute value of the first input and the second input, and then copy the sign from the first input to the result.
2022-06-11 18:25:31 +03:00
Elad Ashkenazi 17e28ae85d SPU LLVM: Improve expression matching detection for moved registers 2022-06-11 16:13:58 +03:00
Malcolm Jestadt 64616f1408 SPU LLVM: Microfixes
- Avoid vpermb path in shufb when op.ra == op.rb
- Reverse indices with (c ^ 0xf) rather than (~c) in vpermb path, vpternlogd is a 3 input operation and requires needless mov instructions to avoid destroying inputs
2022-06-08 22:50:30 +03:00
Malcolm Jestadt 1227b0a633 SPU LLVM: Reneable icelake shufb paths
- The previous code works just fine
2022-06-05 13:08:00 +03:00
Elad Ashkenazi 9bb7e8d614
rsx: Implement atomic FIFO fetching (stability improvement) (non-default setting) (#12107) 2022-06-04 15:35:06 +03:00
Malcolm Jestadt 0e5514003a SPU LLVM: Optimize LQR/STQR
- Avoid type mismatch between adds that prevented llvm from combining the operations
2022-06-03 16:16:28 +03:00
Malcolm Jestadt e9dfb3cb63 SPU LLVM: Fixup for inline MFC transfers
- Could previsouly segfault when src and dst were swapped. Just use unaligned instructions instead.
2022-05-29 19:08:36 +03:00
Malcolm Jestadt 6f4398889e SPU LLVM: Optimize inline MFC transfers
- Use wider instructions when possible
2022-05-29 15:32:25 +03:00
Eladash 2ba437b6dc SPU: Implement timer freezing ability 2022-05-14 22:03:47 +03:00
Malcolm Jestadt 91673f8fdc SPU LLVM: Add relaxed xfloat option
- This new setting is on by default
- It's active when approximate default is disabled
- Approximate xfloat is now exposed to the gui
2022-01-31 08:02:48 +03:00
Nekotekina dba2baba9c Implement utils::memory_map_fd (partial)
Improve JIT profiling dump format (data + name, mmap)
Improve objdump interception util (better speed, fix bugs)
Rename spu_ubertrampoline to __ub+number
2022-01-26 15:46:16 +03:00
Nekotekina 11ee1f3eb2 Improve JIT profiling on Linux
Add JIT object dumping functionality.
Add source for objdump interception utility.
2022-01-25 03:16:37 +03:00
Nekotekina 12c83b340d Remove built_function
With today's branch prediction techniques, it's hardly useful.
2022-01-24 22:21:41 +03:00
Nekotekina 4704367382 Remove unnecessary asmjit::imm_ptr 2022-01-18 00:10:32 +03:00
Nekotekina 580bd2b25e Initial Linux Aarch64 support
* Update asmjit dependency (aarch64 branch)
* Disable USE_DISCORD_RPC by default
* Dump some JIT objects in rpcs3 cache dir
* Add SIGILL handler for all platforms
* Fix resetting zeroing denormals in thread pool
* Refactor most v128:: utils into global gv_** functions
* Refactor PPU interpreter (incomplete), remove "precise"
* - Instruction specializations with multiple accuracy flags
* - Adjust calling convention for speed
* - Removed precise/fast setting, replaced with static
* - Started refactoring interpreters for building at runtime JIT
*   (I got tired of poor compiler optimizations)
* - Expose some accuracy settings (SAT, NJ, VNAN, FPCC)
* - Add exec_bytes PPU thread variable (akin to cycle count)
* PPU LLVM: fix VCTUXS+VCTSXS instruction NaN results
* SPU interpreter: remove "precise" for now (extremely non-portable)
* - As with PPU, settings changed to static/dynamic for interpreters.
* - Precise options will be implemented later
* Fix termination after fatal error dialog
2022-01-15 06:48:04 +03:00
Nekotekina cb2748ae08 Update ASMJIT (new upstream API) 2021-12-29 02:45:00 +03:00
Nekotekina d836033212 LLVM: enable some JIT events (Intel, Perf)
Made some related adjustments.
Currently incomplete.
2021-12-26 16:41:37 +03:00
Nekotekina dcd011048d Implement "built_function" utility (runtime-generated assembly)
Similar to build_function_asm, but links without indirection.
Achieved by emitting code directly into a byte array.
2021-12-22 19:27:20 +03:00
Malcolm Jestadt 2f93df480b SPU LLVM: Disable affineqb shufb paths temporarilly 2021-12-10 19:32:10 +03:00
Malcolm Jestadt 0617e9e14b SPU LLVM: Fix vgf2p8affineqb usage
- Reverse the order of the bytes in the selection masks. Previously it was assumed that byte 0 would determine the output of bit 0, but byte 7 determines the output of bit 0.
2021-12-06 12:34:11 +03:00
Malcolm Jestadt 3fde455932 SPU LLVM: Optimize branch following ORX
- test the input of ORX directly for zeroes, instead of the result
2021-11-11 12:58:38 +03:00
Malcolm Jestadt 7573d7289b SPU LLVM: Hook up 128 bit spu verification
- Also fix FMA enablement for sapphirerapids
2021-11-06 21:12:12 +03:00
Nekotekina 69f321a471 LLVM 13 2021-11-02 20:11:08 +03:00
Malcolm Jestadt f06c8b22e8 PPU/SPU LLVM: Emulate VPERM2B with a 256 bit wide VPERMB
- Save 1 uop by using 256 wide VPERMB instead of VPERM2B. (Compiles down to a vinserti128 and vpermb)
2021-10-13 17:51:54 +03:00
Eladash ab50e5483e
GUI Utilities: Implement instruction search, PPU/SPU disasm improvements (#10968)
* GUI Utilities: Implement instruction search in PS3 memory
* String Searcher: Case insensitive search
* PPU DisAsm: Comment constants with ORI
* PPU DisAsm: Add 64-bit constant support
* SPU/PPU DisAsm: Print CELL errors in disasm
* PPU DisAsm: Constant comparison support
2021-10-12 23:12:30 +03:00
Malcolm Jestadt 86716dc37b SPU LLVM: Optimize branches following byteswaps
- The first element can be extracted via vmovd rather than vpextrd, which saves 1 uop.
2021-09-30 13:22:35 +03:00
Malcolm Jestadt f9ab077908 SPU LLVM: Use VDBPSADBW in SUMB
- This instruction can be used to sum bytes horrizontally if the second input vector is all zeroes.
2021-09-30 13:22:35 +03:00
Nekotekina 9e62ca562b SPU LLVM: implement SQRT and DIV pattern detection (xf) 2021-09-17 10:23:43 +03:00
Nekotekina d28b0ba2fa SPU LLVM: implement spu_re, spu_rsqrte
Improve matching with peek_through_bitcasts() helper.
Implement erase_stores() helper.
2021-09-17 10:23:43 +03:00
Nekotekina aba332d4c4 SPU LLVM: make intrinsics for most xfloat instructions 2021-09-17 10:23:43 +03:00
Nekotekina 543fb7a9cb LLVM DSL / SPU LLVM: implement infinite precision shifts
Remove old make_*** helpers in favor of matcheable expressions.
2021-09-17 10:23:43 +03:00
Nekotekina 67b3fc70f8 LLVM DSL: implement absd and match helpers
Matcheable expression absd(a, b) (absolute difference).
2021-09-17 10:23:43 +03:00
Nekotekina 4b8ee85995 LLVM DSL: reimplement pshufb, add 'calli'
Implement postponed custom intrinsic replacement.
Make bitcast operator static like other ones.
2021-09-17 10:23:43 +03:00
Nekotekina 86ead1b93b SPU LLVM: implement FI instruction
Use approximate reciprocal in FRSQEST.
2021-09-17 10:23:43 +03:00
Nekotekina 1685769bd9 LLVM DSL: reimplement fmuladd, force hw fma if present 2021-09-17 10:23:43 +03:00
Nekotekina 2acb6ed60d SPU LLVM: optimize SHUFB for permutation-only shuffles
Drop constant generation when unused.
2021-09-17 10:23:43 +03:00
Nekotekina 144244e902 SPU LLVM: implement missing constant mask handling in SHUFB 2021-09-17 10:23:43 +03:00
Nekotekina 7cf9d1380b LLVM DSL: add line number in get_const_vector automatically 2021-09-17 10:23:43 +03:00
Nekotekina f188019244 LLVM DSL: reimpelement fsqrt, fabs 2021-09-17 10:23:43 +03:00
Eladash bd66dfedc9 Do not allow to unpause after fatal error occured in emulation
* Plus fix #10590
2021-09-09 19:30:54 +02:00
Malcolm Jestadt 43cc62d267 SPU LLVM: Add m_use_vnni
- Alderlake and Sapphirerapids will require an update to the llvm fork before they can be detected
2021-08-31 14:02:05 +03:00
Malcolm Jestadt d304b52391 SPU LLVM: Add VNNI optimized variant of sumb
- Uses vpdpbusd to horrizontally add values, for some reason this is much faster than the normal horizontal add instructions.
2021-08-31 14:02:05 +03:00
Malcolm Jestadt a86b278115 SPU LLVM: Expand byteswap elimination to more instructions 2021-08-31 14:02:05 +03:00
Whatcookie c62deeefd4
SPU LLVM: Add approximate FCEQ/FCMEQ (#8729)
- It's 100% accurate, but will sit under approx xfloat anyways
- Attempts to use a single instruction when 1 value is constant
2021-08-22 10:13:34 +03:00
Nekotekina 05d1b3605e Fixup for SPU Debug mode (bad SHA1)
Should fix crashes due to read out of bounds.
2021-08-01 10:12:08 +03:00
Nekotekina fc5840cda6 SPU Cache: allow to dump cache upon startup
Print also some stats (if SPU Debug is enabled).
2021-07-30 09:21:11 +03:00
Eladash d81a5b1423 SPU LLVM: Add missing WRCH PC updates 2021-05-29 15:26:52 +03:00
Malcolm Jestadt 7c2b08b9b6 SPU LLVM: Expand branch optimizations for more instructions 2021-05-29 13:07:35 +03:00
Nekotekina 160b131de3 types.hpp: implement smin, smax, amin, amax
Rewritten the following global utility constants:
`umax` returns max number, restricted to unsigned.
`smax` returns max signed number, restricted to integrals.
`smin` returns min signed number, restricted to signed.
`amin` returns smin or zero, less restricted.
`amax` returns smax or umax, less restricted.

Fix operators == and <=> for synthesized rel-ops.
2021-05-22 12:10:57 +03:00
Malcolm Jestadt 52780e65e7 SPU LLVM: Optimize branching
- Detect a pattern where vpmovmskb and a check against the sign bit can be used instead of checking against zero
2021-05-17 16:59:20 +03:00
Eladash cacb852a1e Emulation stopping bugfix 2021-05-14 15:35:07 +03:00
Nekotekina 6dca588370 SPU LLVM: improve MPYH instruction
Rewritten to use 16-bit multiplication, as in SPU ASMJIT.
2021-05-13 23:16:53 +03:00
Megamouse 1caf81811a Move unspecific Emulator code out of System.cpp 2021-04-24 11:21:22 +03:00
Nekotekina 67649d7976 SPU LLVM: restore lost comment 2021-04-21 13:33:44 +03:00
Malcolm Jestadt 6247969ede SPU LLVM: Absolute final fixes for icelake shufb paths
- The constant mask was accessing bits in reverse order of what was expected
2021-04-21 11:00:02 +03:00
Malcolm Jestadt efd38fa940 SPU LLVM: Improve byteswap elimination
- Use the data before it has been swapped rather than relying on a second byteswap to cancel out the first
2021-04-20 23:24:21 +03:00
Malcolm Jestadt 551472220e SPU LLVM: Remove icelake shufb paths for now 2021-04-20 23:24:21 +03:00
Malcolm Jestadt 53f13a9721 SPU LLVM: Final fixup for icelake shufb paths
- The cause of the problems was due to the constant mask for gf2p9affineqb being used as the first argument, instead of the second argument.
2021-04-20 13:07:24 +03:00
Nekotekina 9d4fcbf946 bs_t<>: fix/cleanup some operators 2021-04-17 15:54:33 +03:00
Malcolm Jestadt 0a7df9d02e SPU LLVM: add AVX-512 SPU verification
- This is hidden behind a new setting, as some cpus may downclock agressively when executing 512 wide instructions
2021-04-16 09:35:26 +03:00
Megamouse a16d8ba3ea More random changes 2021-04-11 14:01:51 +03:00
Megamouse 03b76b4606 Emu: some cleanup 2021-04-09 21:03:49 +02:00
Nekotekina 95725bf7fc Add -Werror=missing-noreturn (GCC, clang)
May be useful to diagnose functions which fail assertions unconditionally.
2021-04-08 10:29:47 +03:00
Megamouse 02febd3f65 Workaround: Skip progress_dialog during gameplay 2021-04-06 21:39:34 +03:00
Nekotekina 6f1f75bc8f Minor progress dialog refactoring
Add rsx::overlays::progress_dialog class (identical to message_dialog).
Don't use Emu.CallAfter() for native dialogs.
Make g_progr_ptotal waitable.
2021-04-03 22:38:04 +03:00
Nekotekina e9a45a2f45 Implement scoped_progress_dialog
Create Emu/system_progress.hpp
Remove atomic g_progr_show
2021-03-31 23:40:09 +02:00
Nekotekina 2212a131ef Fix some -Weffc++ warnings (part 1) 2021-03-31 11:27:09 +03:00
Megamouse 870224cde0 Emu/overlay: ingame native overlay PPU compilation 2021-03-31 09:38:30 +02:00
Nekotekina b3fb6d7d18 Add and fix -Wredundant-decls (GCC) 2021-03-23 22:48:57 +03:00
Nekotekina a4fdbf0a88 Enable -Wstrict-aliasing=1 (GCC)
Fixed partially.
2021-03-09 03:10:15 +03:00
Nekotekina 53af2dbb3f Add/fix warning -Wignored-qualifiers (GCC/clang)
Fix simple_array::const_iterator as a part of it.
2021-03-09 03:09:50 +03:00
Malcolm Jestadt e5d0e035d0 SPU LLVM: Rearange FM instruction for better performance
- Doesn't eliminate any instructions, but allows for better out of order execution.
2021-03-08 15:48:36 +03:00
Nekotekina 87af905018 Enable -Wunused-parameter 2021-03-06 18:07:08 +03:00
Eladash 004ebfdaee SPU debugger: Implement MFC journal
* Allow to dump up to 1820 commands with up 128 bytes of data each, using key D with the debugger.
2021-03-02 21:57:51 +03:00
Nekotekina ea5e837bd6 fixed_typemap.hpp: return reference 2021-03-02 16:08:14 +03:00
Nekotekina a90ad62fc0 Remove garbage SPUW perf report 2021-02-23 18:24:50 +03:00
Eladash 96400234a8 Remove cpu_thread destructor 2021-02-22 12:47:45 +03:00
Eladash f43260bd58
Atomic waiting refactoring (#9208)
* Use atomic waitables instead instead of global thread wait as often as possible.
* Add ::is_stopped() and and ::is_paued() which can be used in atomic loops and with atomic wait. (constexpr cpu flags test functions)
* Fix notification bug of sys_spu_thread_group_exit/terminate. (old bug, enhanced by #9117)
* Function time statistics at Emu.Stop() restored. (instead of current "X syscall failed with 0x00000000 : 0")
2021-02-13 17:50:07 +03:00
Nekotekina d0126f0fa0 Fix freezes in HLE Vdec and SPU LLVM precompilation.
Freezes could accidentally occur on close or ingame.
Deprecate range-for loop on lf_queue.
This is a part of PR #9208

Co-authored-by: Eladash <elad3356p@gmail.com>
2021-02-01 19:14:01 +03:00
Megamouse 7bddb87306 Simplify compile threads 2021-01-31 12:18:32 +03:00
Eladash d3bc96a201 Fix minor issue with usage of STL thread::hardware_concurrency() 2021-01-29 18:23:29 +03:00
Eladash 0652870204 New RSX Debugger 2021-01-28 17:40:26 +03:00
Nekotekina ee288340b0 Implement thread_ctrl::scoped_priority
RAII priority control (+1, or -1)
2021-01-25 21:49:16 +03:00
Malcolm Jestadt 486d48e4f8 SPU LLVM: Optimize ROTQBY family for VBMI
- Avoid masking pshufb index by 0xf by using vpermb instead.
- Also fix conversion of vperm2b index to ShuffleVector index.
2021-01-25 13:18:23 +03:00
Ani 7c62574e59 spu: Restore workers priority after initialization 2021-01-24 16:40:59 +03:00
Nekotekina f9bc682115 Refactor some 'offending' code a bit (no effect)
It appears linkage errors were rare even in debug mode (GCC/clang).
2021-01-18 21:58:28 +03:00
Malcolm Jestadt a2e8e3090c SPU LLVM: Optimize FSM following comparison
- FSM following a comparison instruction can be optimized to a single shuffle instruction
2021-01-17 16:52:44 +03:00
Nekotekina def364fe28 SPU LLVM: add splat_scalar helper
Unrolls into zshuffle from the preferred slot.
2021-01-17 15:13:28 +03:00
Nekotekina db8e6fe7a7 Enable -Wunused-variable 2021-01-12 14:34:14 +03:00
Malcolm Jestadt c952e99f3e SPU LLVM: Fix edgecase in icelake codegen 2020-12-29 22:01:11 +03:00
Nekotekina bd269bccaf types.hpp: remove intrinsic includes
Replace v128 with u128 in some places.
Removed some unused files.
2020-12-21 21:11:25 +03:00
Eladash ef884642e4 Cleanup disasm classes a bit 2020-12-21 13:46:26 +03:00
Nekotekina db9b7db531 Cleanup and move sysinfo.h -> util/sysinfo.hpp 2020-12-18 12:55:54 +03:00
Nekotekina fb29933d3d Add usz alias for std::size_t 2020-12-18 12:23:53 +03:00
Megamouse d21f87af5d Fix unresponsive UI during SPU compilation 2020-12-16 11:01:51 +03:00
Nekotekina e39348ad96 Make lf_queue<> compatible with atomic_wait 2020-12-15 19:19:36 +03:00
Nekotekina e321765c54 Split BEType.h to util/v128.hpp and util/to_endian.hpp 2020-12-13 16:34:45 +03:00