Commit graph

410 commits

Author SHA1 Message Date
Whatcookie fd6829f757
SPU LLVM: AVX-512 optimization for CFLTU (#14384)
- Takes advantage of vrangeps and the new float to uint instructions from AVX-512
- Down from 6 to 3 instructions

TODO: Somehow ensure that this is what llvm outputs using CreateFPToUI?
2023-07-29 09:01:01 +03:00
Whatcookie 4ecb06c901
SPU LLVM: Optimize common SFI+ROTQMBY pattern 2023-07-28 10:26:40 +03:00
Elad Ashkenazi 9265ff53d0
Include spu.log inside RPCS3.log when SPU Debug is true 2023-07-27 19:15:32 +03:00
Eladash 50dad6801b SPU LLVM: Use get_known_bits() in SHUFB 2023-07-18 22:27:45 +03:00
Malcolm Jestadt ee7475a9d4 SPU LLVM: Handle SHUFB special cases with a lookup table
- Needs 3 instructions to handle the special cases, since x86 lacks an 8 bit simd shift instruction
2023-07-18 22:27:45 +03:00
oltolm 0c94606fcf
Make compile with msvc, clang and gcc on Windows 2023-07-11 21:40:30 +03:00
Eladash 482dd0e8f8 SPU: Remove wrong clamp in MFC_Size
Just crashes real MFC.
2023-07-09 13:33:03 +03:00
RipleyTom cbb1b1f28e Fix spu_fm 2023-05-19 18:26:42 +03:00
RipleyTom f11770b88b
Better accuracy for FREST/FRSQEST (#13863) 2023-05-15 17:20:47 +01:00
RipleyTom 5c0113ce59 Deterministic FREST and FRSQEST 2023-05-06 12:59:34 +03:00
Ivan Chikish bb8e43f16c SPU LLVM: fixup custom LICM pass 2023-04-22 03:07:06 +03:00
Ivan Chikish 1041284384 SPU LLVM: sink stores deeper in custom LICM pass 2023-04-21 18:11:59 +03:00
Ivan Chikish 183bea3b98 SPU LLVM: upgrade custom DSE pass 2023-04-20 11:12:31 +03:00
Ivan Chikish 39d17a94c6 SPU LLVM: make savestates unsavable inside the code 2023-04-18 12:19:15 +03:00
Ivan Chikish 8153e5338f SPU LLVM: optimize register stores
Custom implementation of DSE+LICM
2023-04-18 12:19:15 +03:00
Ivan Chikish 44b3709d1d SPU LLVM: use volatile stores for PC update 2023-04-15 12:40:59 +03:00
Ivan Chikish ba29f0ccd1 SPU LLVM: use atomic loads in read channel count 2023-04-14 13:36:04 +03:00
Ivan Chikish 3473e19508 SPU LLVM: fix savestate safety guards
Volatile was removed since it prevented optimizations.
2023-04-14 07:26:30 +03:00
RipleyTom d35fecbeea Forces deterministic FP operations when online 2023-04-12 15:31:36 +03:00
Ivan Chikish 06b0e35fb9 Update to LLVM 16.0.1
Fix Zen4+ AVX-512 detection
2023-04-11 12:13:09 +03:00
oltolm 6fbca1acfd remove unnecessary pointer bitcasts 2023-04-09 12:45:18 +03:00
Ivan Chikish fb88e1c1c9 Update to LLVM 16.0.0, switch to upstream LLVM 2023-04-06 10:19:31 +03:00
oltolm cf5346c263 use new LLVM API in SPURecompiler 2023-03-12 10:11:06 +03:00
Ivan Chikish 776b3b5efa SPU LLVM: fix regression from #13500
Fixes #13526
2023-03-11 19:48:55 +03:00
oltolm 520524285a
llvm: update code to new API (#13500)
* llvm: update code to new API

* llvm: remove OLDLLVM define
2023-03-11 01:57:21 +03:00
Malcolm Jestadt 813f7b50c1 SPU LLVM: Minor SUMB AVX-512 path optimization
- Tweak shuffle to allow LLVM to emit a cheap blend instruction instead of the expensive VPERMI2W instruction
2023-01-27 13:06:48 +03:00
Eladash 2a00a88e2a SPU LLVM: don't force-enter process_mfc_cmd() because it's slower 2022-10-04 16:28:34 +03:00
Malcolm Jestadt d8897c585d PPU/SPU LLVM: Allow Zen4 cpus to use VPERMI2B/VPERMT2B instead of the vperm2b256to128 path
- Zen4 based cpus can process VPERM2B in a single uop, unlike intel where it is 3 uops.
2022-10-01 15:38:29 +03:00
Nekotekina 6ff6a4989a Implement at32() util
Works like .at() but uses source location for "exception".
2022-09-26 18:04:15 +03:00
Nekotekina b49a1f27eb Warning fixes 2022-09-17 16:35:02 +03:00
Nekotekina 5985f0eefa BufferUtils: cleanup regarding ARM64 2022-09-07 17:59:07 +03:00
sguo35 a0d48c588a spu/arm64: clean up assembly code generation
Clean up asmjit usage so we don't unnecessarily allocate memory
anymore for SPURecompiler functions.
2022-09-07 17:33:01 +03:00
Eladash ee1384341e rsx: Implement atomic vertex upload (with Strict Rendering Mode) 2022-09-01 20:09:28 +03:00
Eladash 506b9deec5 Savestates/SPU LLVM: Improve saving performance 2022-08-25 23:54:56 +03:00
Malcolm Jestadt 51e6d0a336 SPU LLVM: Add integer compare optimization for FCMGT 2022-07-29 11:59:59 +03:00
sguo35 73ed657e00 spu/arm64: fix 16 byte branch patch alignment 2022-07-15 12:37:33 +03:00
sguo35 c52abed4d3 spu: implement ubertrampoline generator for arm64
Implement the ubertrampoline generator for arm64. It generally follows
the x86 version, but uses asmjit to generate code instead of writing raw
opcodes to memory, trading memory usage for readability. Currently the
trampoline implementation is fairly inefficient in terms of instruction
size and is substantially larger than the x86 version.
2022-07-15 12:37:33 +03:00
sguo35 9e57efe82c spu: implement assembly functions for arm64 2022-07-15 12:37:33 +03:00
sguo35 77ab872bec spu: remove rotqby C++ impl
rotqby C++ implementation is broken, since replacing it with the
intrinsic version reliably fixes spurs test. A conditional branch
immediately after a rotqby instruction will fail using the C++ version
but succeed using the intrinsic.
2022-07-15 12:37:33 +03:00
Eladash 3e51426379 Savestates/SPU: Kill emulation when its safe to save SPU state 2022-07-15 09:30:53 +03:00
Nekotekina 4b787b22c8 Implement FN (lambda shortener)
Useful for some higher order functions.
Allows to make short lambdas even shorter.
2022-07-08 14:47:41 +03:00
Eladash f0c71ae2ae Savestates: Fix saving sys_event_queue_destroy 2022-07-08 12:57:43 +03:00
Eladash 2ccb0c8f42 SPU LLVM/Savestates: Remove unneeded store insurance and add related fix 2022-07-06 19:43:25 +03:00
Elad Ashkenazi fcd297ffb2
Savestates Support For PS3 Emulation (#10478) 2022-07-04 16:02:17 +03:00
Ivan c2190f71ca
SPU/PPU LLVM: fix triple setup (regression fix) (#12228) 2022-06-14 18:13:43 +03:00
Jeff Guo cefc37a553
PPU LLVM arm64+macOS port (#12115)
* BufferUtils: use naive function pointer on Apple arm64

Use naive function pointer on Apple arm64 because ASLR breaks asmjit.
See BufferUtils.cpp comment for explanation on why this happens and how
to fix if you want to use asmjit.

* build-macos: fix source maps for Mac

Tell Qt not to strip debug symbols when we're in debug or relwithdebinfo
modes.

* LLVM PPU: fix aarch64 on macOS

Force MachO on macOS to fix LLVM being unable to patch relocations
during codegen. Adds Aarch64 NEON intrinsics for x86 intrinsics used by
PPUTranslator/Recompiler.

* virtual memory: use 16k pages on aarch64 macOS

Temporary hack to get things working by using 16k pages instead of 4k
pages in VM emulation.

* PPU/SPU: fix NEON intrinsics and compilation for arm64 macOS

Fixes some intrinsics usage and patches usages of asmjit to properly
emit absolute jmps so ASLR doesn't cause out of bounds rel jumps. Also
patches the SPU recompiler to properly work on arm64 by telling LLVM to
target arm64.

* virtual memory: fix W^X toggles on macOS aarch64

Fixes W^X on macOS aarch64 by setting all JIT mmap'd regions to default
to RW mode. For both SPU and PPU execution threads, when initialization
finishes we toggle to RX mode. This exploits Apple's per-thread setting
for RW/RX to let us be technically compliant with the OS's W^X
    enforcement while not needing to actually separate the memory
    allocated for code/data.

* PPU: implement aarch64 specific functions

Implements ppu_gateway for arm64 and patches LLVM initialization to use
the correct triple. Adds some fixes for macOS W^X JIT restrictions when
entering/exiting JITed code.

* PPU: Mark rpcs3 calls as non-tail

Strictly speaking, rpcs3 JIT -> C++ calls are not tail calls. If you
call a function inside e.g. an L2 syscall, it will clobber LR on arm64
and subtly break returns in emulated code. Only JIT -> JIT "calls"
should be tail.

* macOS/arm64: compatibility fixes

* vm: patch virtual memory for arm64 macOS

Tag mmap calls with MAP_JIT to allow W^X on macOS. Fix mmap calls to
existing mmap'd addresses that were tagged with MAP_JIT on macOS. Fix
memory unmapping on 16K page machines with a hack to mark "unmapped"
pages as RW.

* PPU: remove wrong comment

* PPU: fix a merge regression

* vm: remove 16k page hacks

* PPU: formatting fixes

* PPU: fix arm64 null function assembly

* ppu: clean up arch-specific instructions
2022-06-14 15:28:38 +03:00
Nekotekina cb2c0733e2 SPU LLVM: fix vrangeps usage in clamp_smax 2022-06-12 16:40:04 +02:00
Malcolm Jestadt ebeeafc94f SPU LLVM: Use vrangeps in clamp_smax
- This instruction can clamp a value between a range of values, something which previously needed 2 instructions.
- With the immediate byte set to 0x2 it will compute the minimum between the absolute value of the first input and the second input, and then copy the sign from the first input to the result.
2022-06-11 18:25:31 +03:00
Elad Ashkenazi 17e28ae85d SPU LLVM: Improve expression matching detection for moved registers 2022-06-11 16:13:58 +03:00
Malcolm Jestadt 64616f1408 SPU LLVM: Microfixes
- Avoid vpermb path in shufb when op.ra == op.rb
- Reverse indices with (c ^ 0xf) rather than (~c) in vpermb path, vpternlogd is a 3 input operation and requires needless mov instructions to avoid destroying inputs
2022-06-08 22:50:30 +03:00