Malcolm Jestadt
ebeeafc94f
SPU LLVM: Use vrangeps in clamp_smax
...
- This instruction can clamp a value between a range of values, something which previously needed 2 instructions.
- With the immediate byte set to 0x2 it will compute the minimum between the absolute value of the first input and the second input, and then copy the sign from the first input to the result.
2022-06-11 18:25:31 +03:00
Elad Ashkenazi
17e28ae85d
SPU LLVM: Improve expression matching detection for moved registers
2022-06-11 16:13:58 +03:00
Malcolm Jestadt
64616f1408
SPU LLVM: Microfixes
...
- Avoid vpermb path in shufb when op.ra == op.rb
- Reverse indices with (c ^ 0xf) rather than (~c) in vpermb path, vpternlogd is a 3 input operation and requires needless mov instructions to avoid destroying inputs
2022-06-08 22:50:30 +03:00
Malcolm Jestadt
1227b0a633
SPU LLVM: Reneable icelake shufb paths
...
- The previous code works just fine
2022-06-05 13:08:00 +03:00
Elad Ashkenazi
9bb7e8d614
rsx: Implement atomic FIFO fetching (stability improvement) (non-default setting) ( #12107 )
2022-06-04 15:35:06 +03:00
Malcolm Jestadt
0e5514003a
SPU LLVM: Optimize LQR/STQR
...
- Avoid type mismatch between adds that prevented llvm from combining the operations
2022-06-03 16:16:28 +03:00
Malcolm Jestadt
e9dfb3cb63
SPU LLVM: Fixup for inline MFC transfers
...
- Could previsouly segfault when src and dst were swapped. Just use unaligned instructions instead.
2022-05-29 19:08:36 +03:00
Malcolm Jestadt
6f4398889e
SPU LLVM: Optimize inline MFC transfers
...
- Use wider instructions when possible
2022-05-29 15:32:25 +03:00
Eladash
2ba437b6dc
SPU: Implement timer freezing ability
2022-05-14 22:03:47 +03:00
Malcolm Jestadt
91673f8fdc
SPU LLVM: Add relaxed xfloat option
...
- This new setting is on by default
- It's active when approximate default is disabled
- Approximate xfloat is now exposed to the gui
2022-01-31 08:02:48 +03:00
Nekotekina
dba2baba9c
Implement utils::memory_map_fd (partial)
...
Improve JIT profiling dump format (data + name, mmap)
Improve objdump interception util (better speed, fix bugs)
Rename spu_ubertrampoline to __ub+number
2022-01-26 15:46:16 +03:00
Nekotekina
11ee1f3eb2
Improve JIT profiling on Linux
...
Add JIT object dumping functionality.
Add source for objdump interception utility.
2022-01-25 03:16:37 +03:00
Nekotekina
12c83b340d
Remove built_function
...
With today's branch prediction techniques, it's hardly useful.
2022-01-24 22:21:41 +03:00
Nekotekina
4704367382
Remove unnecessary asmjit::imm_ptr
2022-01-18 00:10:32 +03:00
Nekotekina
580bd2b25e
Initial Linux Aarch64 support
...
* Update asmjit dependency (aarch64 branch)
* Disable USE_DISCORD_RPC by default
* Dump some JIT objects in rpcs3 cache dir
* Add SIGILL handler for all platforms
* Fix resetting zeroing denormals in thread pool
* Refactor most v128:: utils into global gv_** functions
* Refactor PPU interpreter (incomplete), remove "precise"
* - Instruction specializations with multiple accuracy flags
* - Adjust calling convention for speed
* - Removed precise/fast setting, replaced with static
* - Started refactoring interpreters for building at runtime JIT
* (I got tired of poor compiler optimizations)
* - Expose some accuracy settings (SAT, NJ, VNAN, FPCC)
* - Add exec_bytes PPU thread variable (akin to cycle count)
* PPU LLVM: fix VCTUXS+VCTSXS instruction NaN results
* SPU interpreter: remove "precise" for now (extremely non-portable)
* - As with PPU, settings changed to static/dynamic for interpreters.
* - Precise options will be implemented later
* Fix termination after fatal error dialog
2022-01-15 06:48:04 +03:00
Nekotekina
cb2748ae08
Update ASMJIT (new upstream API)
2021-12-29 02:45:00 +03:00
Nekotekina
d836033212
LLVM: enable some JIT events (Intel, Perf)
...
Made some related adjustments.
Currently incomplete.
2021-12-26 16:41:37 +03:00
Nekotekina
dcd011048d
Implement "built_function" utility (runtime-generated assembly)
...
Similar to build_function_asm, but links without indirection.
Achieved by emitting code directly into a byte array.
2021-12-22 19:27:20 +03:00
Malcolm Jestadt
2f93df480b
SPU LLVM: Disable affineqb shufb paths temporarilly
2021-12-10 19:32:10 +03:00
Malcolm Jestadt
0617e9e14b
SPU LLVM: Fix vgf2p8affineqb usage
...
- Reverse the order of the bytes in the selection masks. Previously it was assumed that byte 0 would determine the output of bit 0, but byte 7 determines the output of bit 0.
2021-12-06 12:34:11 +03:00
Malcolm Jestadt
3fde455932
SPU LLVM: Optimize branch following ORX
...
- test the input of ORX directly for zeroes, instead of the result
2021-11-11 12:58:38 +03:00
Malcolm Jestadt
7573d7289b
SPU LLVM: Hook up 128 bit spu verification
...
- Also fix FMA enablement for sapphirerapids
2021-11-06 21:12:12 +03:00
Nekotekina
69f321a471
LLVM 13
2021-11-02 20:11:08 +03:00
Malcolm Jestadt
f06c8b22e8
PPU/SPU LLVM: Emulate VPERM2B with a 256 bit wide VPERMB
...
- Save 1 uop by using 256 wide VPERMB instead of VPERM2B. (Compiles down to a vinserti128 and vpermb)
2021-10-13 17:51:54 +03:00
Eladash
ab50e5483e
GUI Utilities: Implement instruction search, PPU/SPU disasm improvements ( #10968 )
...
* GUI Utilities: Implement instruction search in PS3 memory
* String Searcher: Case insensitive search
* PPU DisAsm: Comment constants with ORI
* PPU DisAsm: Add 64-bit constant support
* SPU/PPU DisAsm: Print CELL errors in disasm
* PPU DisAsm: Constant comparison support
2021-10-12 23:12:30 +03:00
Malcolm Jestadt
86716dc37b
SPU LLVM: Optimize branches following byteswaps
...
- The first element can be extracted via vmovd rather than vpextrd, which saves 1 uop.
2021-09-30 13:22:35 +03:00
Malcolm Jestadt
f9ab077908
SPU LLVM: Use VDBPSADBW in SUMB
...
- This instruction can be used to sum bytes horrizontally if the second input vector is all zeroes.
2021-09-30 13:22:35 +03:00
Nekotekina
9e62ca562b
SPU LLVM: implement SQRT and DIV pattern detection (xf)
2021-09-17 10:23:43 +03:00
Nekotekina
d28b0ba2fa
SPU LLVM: implement spu_re, spu_rsqrte
...
Improve matching with peek_through_bitcasts() helper.
Implement erase_stores() helper.
2021-09-17 10:23:43 +03:00
Nekotekina
aba332d4c4
SPU LLVM: make intrinsics for most xfloat instructions
2021-09-17 10:23:43 +03:00
Nekotekina
543fb7a9cb
LLVM DSL / SPU LLVM: implement infinite precision shifts
...
Remove old make_*** helpers in favor of matcheable expressions.
2021-09-17 10:23:43 +03:00
Nekotekina
67b3fc70f8
LLVM DSL: implement absd and match helpers
...
Matcheable expression absd(a, b) (absolute difference).
2021-09-17 10:23:43 +03:00
Nekotekina
4b8ee85995
LLVM DSL: reimplement pshufb, add 'calli'
...
Implement postponed custom intrinsic replacement.
Make bitcast operator static like other ones.
2021-09-17 10:23:43 +03:00
Nekotekina
86ead1b93b
SPU LLVM: implement FI instruction
...
Use approximate reciprocal in FRSQEST.
2021-09-17 10:23:43 +03:00
Nekotekina
1685769bd9
LLVM DSL: reimplement fmuladd, force hw fma if present
2021-09-17 10:23:43 +03:00
Nekotekina
2acb6ed60d
SPU LLVM: optimize SHUFB for permutation-only shuffles
...
Drop constant generation when unused.
2021-09-17 10:23:43 +03:00
Nekotekina
144244e902
SPU LLVM: implement missing constant mask handling in SHUFB
2021-09-17 10:23:43 +03:00
Nekotekina
7cf9d1380b
LLVM DSL: add line number in get_const_vector automatically
2021-09-17 10:23:43 +03:00
Nekotekina
f188019244
LLVM DSL: reimpelement fsqrt, fabs
2021-09-17 10:23:43 +03:00
Eladash
bd66dfedc9
Do not allow to unpause after fatal error occured in emulation
...
* Plus fix #10590
2021-09-09 19:30:54 +02:00
Malcolm Jestadt
43cc62d267
SPU LLVM: Add m_use_vnni
...
- Alderlake and Sapphirerapids will require an update to the llvm fork before they can be detected
2021-08-31 14:02:05 +03:00
Malcolm Jestadt
d304b52391
SPU LLVM: Add VNNI optimized variant of sumb
...
- Uses vpdpbusd to horrizontally add values, for some reason this is much faster than the normal horizontal add instructions.
2021-08-31 14:02:05 +03:00
Malcolm Jestadt
a86b278115
SPU LLVM: Expand byteswap elimination to more instructions
2021-08-31 14:02:05 +03:00
Whatcookie
c62deeefd4
SPU LLVM: Add approximate FCEQ/FCMEQ ( #8729 )
...
- It's 100% accurate, but will sit under approx xfloat anyways
- Attempts to use a single instruction when 1 value is constant
2021-08-22 10:13:34 +03:00
Nekotekina
05d1b3605e
Fixup for SPU Debug mode (bad SHA1)
...
Should fix crashes due to read out of bounds.
2021-08-01 10:12:08 +03:00
Nekotekina
fc5840cda6
SPU Cache: allow to dump cache upon startup
...
Print also some stats (if SPU Debug is enabled).
2021-07-30 09:21:11 +03:00
Eladash
d81a5b1423
SPU LLVM: Add missing WRCH PC updates
2021-05-29 15:26:52 +03:00
Malcolm Jestadt
7c2b08b9b6
SPU LLVM: Expand branch optimizations for more instructions
2021-05-29 13:07:35 +03:00
Nekotekina
160b131de3
types.hpp: implement smin, smax, amin, amax
...
Rewritten the following global utility constants:
`umax` returns max number, restricted to unsigned.
`smax` returns max signed number, restricted to integrals.
`smin` returns min signed number, restricted to signed.
`amin` returns smin or zero, less restricted.
`amax` returns smax or umax, less restricted.
Fix operators == and <=> for synthesized rel-ops.
2021-05-22 12:10:57 +03:00
Malcolm Jestadt
52780e65e7
SPU LLVM: Optimize branching
...
- Detect a pattern where vpmovmskb and a check against the sign bit can be used instead of checking against zero
2021-05-17 16:59:20 +03:00