Commit graph

173 commits

Author SHA1 Message Date
Jeff Guo cefc37a553
PPU LLVM arm64+macOS port (#12115)
* BufferUtils: use naive function pointer on Apple arm64

Use naive function pointer on Apple arm64 because ASLR breaks asmjit.
See BufferUtils.cpp comment for explanation on why this happens and how
to fix if you want to use asmjit.

* build-macos: fix source maps for Mac

Tell Qt not to strip debug symbols when we're in debug or relwithdebinfo
modes.

* LLVM PPU: fix aarch64 on macOS

Force MachO on macOS to fix LLVM being unable to patch relocations
during codegen. Adds Aarch64 NEON intrinsics for x86 intrinsics used by
PPUTranslator/Recompiler.

* virtual memory: use 16k pages on aarch64 macOS

Temporary hack to get things working by using 16k pages instead of 4k
pages in VM emulation.

* PPU/SPU: fix NEON intrinsics and compilation for arm64 macOS

Fixes some intrinsics usage and patches usages of asmjit to properly
emit absolute jmps so ASLR doesn't cause out of bounds rel jumps. Also
patches the SPU recompiler to properly work on arm64 by telling LLVM to
target arm64.

* virtual memory: fix W^X toggles on macOS aarch64

Fixes W^X on macOS aarch64 by setting all JIT mmap'd regions to default
to RW mode. For both SPU and PPU execution threads, when initialization
finishes we toggle to RX mode. This exploits Apple's per-thread setting
for RW/RX to let us be technically compliant with the OS's W^X
    enforcement while not needing to actually separate the memory
    allocated for code/data.

* PPU: implement aarch64 specific functions

Implements ppu_gateway for arm64 and patches LLVM initialization to use
the correct triple. Adds some fixes for macOS W^X JIT restrictions when
entering/exiting JITed code.

* PPU: Mark rpcs3 calls as non-tail

Strictly speaking, rpcs3 JIT -> C++ calls are not tail calls. If you
call a function inside e.g. an L2 syscall, it will clobber LR on arm64
and subtly break returns in emulated code. Only JIT -> JIT "calls"
should be tail.

* macOS/arm64: compatibility fixes

* vm: patch virtual memory for arm64 macOS

Tag mmap calls with MAP_JIT to allow W^X on macOS. Fix mmap calls to
existing mmap'd addresses that were tagged with MAP_JIT on macOS. Fix
memory unmapping on 16K page machines with a hack to mark "unmapped"
pages as RW.

* PPU: remove wrong comment

* PPU: fix a merge regression

* vm: remove 16k page hacks

* PPU: formatting fixes

* PPU: fix arm64 null function assembly

* ppu: clean up arch-specific instructions
2022-06-14 15:28:38 +03:00
Nekotekina e243ef5907 PPU: implement accurate FRES
Implemented with an accurate lookup table.
2022-05-11 10:46:08 +03:00
doesthisusername 7b162c7513 PPU: implement quasi-accurate FRSQRTE
Denormals are handled like zeros.
NaN handling is inaccurate in some cases.

Co-authored-by: Nekotekina <nekotekina@gmail.com>
2022-05-11 10:46:08 +03:00
Nekotekina 0786a0a088 PPU LLVM: match interpreter for VEXPTEFP/VLOGEFP 2022-05-03 08:27:44 +03:00
Nekotekina 0de9960772 PPU: rewrite MFOCRF+MFCR instructions 2022-01-21 12:49:52 +03:00
Nekotekina 349f251d14 PPU LLVM: use masked stores for STVLX/STVRX
Drop maskmove intrinsic, not portable.
Its implicit NT hint may also hurt performance.
2022-01-20 21:16:00 +03:00
Nekotekina 14cca55b50 PPU: refactor vector rounding instructions
Fix: nearbyint -> roundeven
2022-01-18 00:10:32 +03:00
Nekotekina f95395b351 PPU LLVM: improve accuracy of VSL/VSR
Passes tests, should now be equal to interpreter.
2022-01-15 21:13:31 +03:00
Nekotekina df24cff0b1 PPU LLVM: fix VMINFP and VMAXFP accuracy
PPU cache needs to be cleared.
2022-01-15 17:36:57 +03:00
Nekotekina 6dda047128 PPU LLVM: fix VNMSUBFP sign handling
PPU cache needs to be cleared.
2022-01-15 17:36:57 +03:00
Nekotekina e9efa73eed PPU: restore previous NJ mode handling option
Fix the divergence between PPU Interpreter and LLVM.
2022-01-15 17:36:57 +03:00
Nekotekina 580bd2b25e Initial Linux Aarch64 support
* Update asmjit dependency (aarch64 branch)
* Disable USE_DISCORD_RPC by default
* Dump some JIT objects in rpcs3 cache dir
* Add SIGILL handler for all platforms
* Fix resetting zeroing denormals in thread pool
* Refactor most v128:: utils into global gv_** functions
* Refactor PPU interpreter (incomplete), remove "precise"
* - Instruction specializations with multiple accuracy flags
* - Adjust calling convention for speed
* - Removed precise/fast setting, replaced with static
* - Started refactoring interpreters for building at runtime JIT
*   (I got tired of poor compiler optimizations)
* - Expose some accuracy settings (SAT, NJ, VNAN, FPCC)
* - Add exec_bytes PPU thread variable (akin to cycle count)
* PPU LLVM: fix VCTUXS+VCTSXS instruction NaN results
* SPU interpreter: remove "precise" for now (extremely non-portable)
* - As with PPU, settings changed to static/dynamic for interpreters.
* - Precise options will be implemented later
* Fix termination after fatal error dialog
2022-01-15 06:48:04 +03:00
Eladash a60cee6536 Update PPUTranslator::MTFSFI for its intention to be clearer 2022-01-12 03:37:39 +03:00
Nekotekina e3e39e8de3 PPU LLVM: rewrite and optimize saturation bit
Use vector accumulator
2021-12-03 00:14:06 +03:00
Nekotekina 209b14fbac PPU LLVM: inline remaining vector instructions 2021-12-03 00:14:06 +03:00
Nekotekina 04c9d01390 PPU LLVM: modernize most vector instructions
Rewritten VSUM instructions:
VSUMSWS, VSUM2SWS, VSUM4SBS, VSUM4SHS, VSUM4UBS
2021-12-03 00:14:06 +03:00
Nekotekina c9d8e59dbf PPU LLVM: allow to drop setting SAT flag (optimization, module-wide)
Implement ppu_attr::has_mfvscr (partially, module-wide search).
If this instruction isn't found, allow to drop setting SAT flag.
It's based on presumption that only MFVSCR can retrieve SAT flag.
2021-12-03 00:14:06 +03:00
Nekotekina 86b194014b PPU LLVM: rewrite more packing instructions
Rewritten VPKUHUM, VPKUHUS, VPKUWUM, VPKUWUS.
Decoupled saturation test from sat pack pattern.
2021-12-03 00:14:06 +03:00
Nekotekina e7c827f73b PPU LLVM: rewrite some packing instructions
Rewritten VPKSHSS, VPKSHUS, VPKSWSS, VPKSWUS.
Decoupled saturation test from sat pack pattern.
2021-12-03 00:14:06 +03:00
Nekotekina abe498f35c PPU LLVM: modernize some code with new DSL
PPU: rewritten instructions VMHADDSHS, VMHRADDSHS
PPU: added optimized path for VPERM (ra=rb)
2021-12-03 00:14:06 +03:00
Nekotekina 69f321a471 LLVM 13 2021-11-02 20:11:08 +03:00
Malcolm Jestadt f06c8b22e8 PPU/SPU LLVM: Emulate VPERM2B with a 256 bit wide VPERMB
- Save 1 uop by using 256 wide VPERMB instead of VPERM2B. (Compiles down to a vinserti128 and vpermb)
2021-10-13 17:51:54 +03:00
Nekotekina 4b8ee85995 LLVM DSL: reimplement pshufb, add 'calli'
Implement postponed custom intrinsic replacement.
Make bitcast operator static like other ones.
2021-09-17 10:23:43 +03:00
Nekotekina 7cf9d1380b LLVM DSL: add line number in get_const_vector automatically 2021-09-17 10:23:43 +03:00
Eladash f98595bee5 Patches/PPU: Add jump_link patch type 2021-09-10 11:46:39 +03:00
Nekotekina 06f733a7f2 Fixup No.2 for #10779 2021-09-01 16:56:38 +03:00
Eladash b40ed5bdb7
Patches/PPU: Extend and improve patching capabilities (code allocations, jumps to any address) (#10779)
* Patches/PPU: Implement dynamic code allocation + Any-Address jump patches

Also fix deallocation path of fixed allocation patches.
2021-09-01 13:38:17 +03:00
Eladash ddb042148d Patches/LLVM: Implement Complex Patches Support 2021-08-26 23:04:32 +03:00
Nekotekina 160b131de3 types.hpp: implement smin, smax, amin, amax
Rewritten the following global utility constants:
`umax` returns max number, restricted to unsigned.
`smax` returns max signed number, restricted to integrals.
`smin` returns min signed number, restricted to signed.
`amin` returns smin or zero, less restricted.
`amax` returns smax or umax, less restricted.

Fix operators == and <=> for synthesized rel-ops.
2021-05-22 12:10:57 +03:00
Megamouse a16d8ba3ea More random changes 2021-04-11 14:01:51 +03:00
Nekotekina 87af905018 Enable -Wunused-parameter 2021-03-06 18:07:08 +03:00
Nekotekina 0c034ad7de PPU LLVM: upgrade to GHC call conv
Get rid of some global variables.
Implement ppu_escape (unused yet).
Bump PPU cache version to v4.
2021-02-01 11:30:50 +03:00
Nekotekina c89362f6a2 PPU LLVM: don't use module name as PRX indicator 2021-02-01 11:30:50 +03:00
Nekotekina 8a029159cd PPU Analyser: compile certain functions on per-instruction basis
PPU LLVM: optimize small blocks
2021-02-01 11:30:50 +03:00
Nekotekina 382509d778 PPU LLVM: Implement inline __add_get_ov 2021-02-01 11:30:50 +03:00
Nekotekina f9ee8978ff PPU LLVM: improve analyser
Compile possibly executable holes between detected functions.
Add unused "PPU LLVM Greedy Mode" option (for future updates).
Add "nounwind" attribute to compiled functions (reduces size).
2021-02-01 11:30:50 +03:00
Nekotekina db8e6fe7a7 Enable -Wunused-variable 2021-01-12 14:34:14 +03:00
Nekotekina bd269bccaf types.hpp: remove intrinsic includes
Replace v128 with u128 in some places.
Removed some unused files.
2020-12-21 21:11:25 +03:00
Nekotekina fb29933d3d Add usz alias for std::size_t 2020-12-18 12:23:53 +03:00
Eladash 7eb16e13bb PRX loader: Fix libfs_155.sprx loading
Fix relocations' segments referencing when there are "empty" (memsize=0) LOAD segments.
2020-12-15 11:16:45 +03:00
Nekotekina e321765c54 Split BEType.h to util/v128.hpp and util/to_endian.hpp 2020-12-13 16:34:45 +03:00
Nekotekina 65c04e4ddd Remove constexpr from ppu/spu decoders.
We don't need them at compile time (yet).
But can reduce compile time and complexity.
2020-12-10 15:06:01 +03:00
Nekotekina 36c8654fb8 Remove HERE macro
Some cleanup.
Add location to some functions.
2020-12-10 12:30:22 +03:00
Nekotekina 5d934c8759 Improve narrow() and size32() with src_loc detection 2020-12-09 16:26:20 +03:00
Nekotekina e055d16b2c Replace verify() with ensure() with auto src location.
Expression ensure(x) returns x.
Using comma operator removed.
2020-12-09 15:43:38 +03:00
RipleyTom af8c661a64 Remove BOM markers 2020-12-06 15:30:12 +03:00
Nekotekina 1b8bf081b5 Upgrade to LLVM 11 Stable 2020-11-02 21:23:25 +03:00
Eladash 443c2b920d PPU: Handle cache line inconsistencies (PPU 128 reservations) 2020-10-16 22:51:30 +03:00
Nekotekina f2d2a6b605 JIT cleanup for PPU LLVM
Remove MemoryManager3 as unnecessary.
Rewrite MemoryManager1 to use its own 512M reservations.
Disabled unwind info registration on all platforms.
Use 64-bit executable pointers under vm::g_exec_addr area.
Stop relying on deploying PPU LLVM objects in first 2G of address space.
Implement jit_module_manager, protect its data with mutex.
2020-10-11 17:22:28 +03:00
Eladash f4ca6f02a1 PPU: Implement support for 128-byte reservations coherency 2020-09-28 22:34:42 +03:00