Commit graph

957 commits

Author SHA1 Message Date
kd-11 73784b9e12 Fix GCC build 2022-10-03 12:57:16 +03:00
kd-11 533f960854 rsx: Handle some more corner cases 2022-10-03 12:57:16 +03:00
kd-11 765208a181 rsx: Avoid clobbering CELL memory when splitting fbos 2022-10-03 12:57:16 +03:00
kd-11 4417701ea7 rsx: Track orphaned surfaces' parent addresses 2022-10-03 12:57:16 +03:00
kd-11 f66eaf8f44 rsx: Add some handy util functions to simple_array 2022-10-03 12:57:16 +03:00
kd-11 5281a85b67 rsx: Fix compiler warnings 2022-09-28 12:55:31 +03:00
Nekotekina 6ff6a4989a Implement at32() util
Works like .at() but uses source location for "exception".
2022-09-26 18:04:15 +03:00
kd-11 dd8a337b14 rsx: Fix some more warnings 2022-09-22 23:46:48 +03:00
kd-11 61666bae69 rsx: Fix hardware deswizzle not getting used when hardware deswizzle flag is not set 2022-09-22 23:46:48 +03:00
Nekotekina c4db65cc08 Fix one more warning 2022-09-18 18:35:17 +03:00
Nekotekina b49a1f27eb Warning fixes 2022-09-17 16:35:02 +03:00
Nekotekina 5985f0eefa BufferUtils: cleanup regarding ARM64 2022-09-07 17:59:07 +03:00
Nekotekina 82258915da BufferUtils: rewrite remaining intrinsic code with simd_builder 2022-09-07 17:59:07 +03:00
Nekotekina 11a1f090d3 BufferUtils: simd_builder refactoring
Some simplifications implemented.
2022-09-07 17:59:07 +03:00
Elad Ashkenazi 290226539f
Fix ARM build (#12606) 2022-09-04 21:11:04 +03:00
Nekotekina 58e3232710 BufferUtils: Fix regression in upload_untouched 2022-09-01 17:39:04 +03:00
Nekotekina e28707055b Implement simd_builder for x86
ASMJIT-based tool for building vectorized loops (such as ones in BufferUtils.cpp)
2022-08-28 18:38:52 +03:00
kd-11 1fc0191311 Fix build 2022-08-23 23:49:46 +03:00
kd-11 1f9e04f72d rsx/vk: Implement flushing surface cache blocks to linear mem 2022-08-23 23:49:46 +03:00
kd-11 bca833dad7 Fix surface reuse 2022-08-20 01:23:15 +03:00
kd-11 2e504b2dac rsx: Silence some warnings 2022-08-19 14:29:20 +03:00
kd-11 bacf518189 rsx: Fix 2D intersection tests 2022-08-14 23:53:50 +03:00
kd-11 c51d3b5465 Workaround for msvc weirdness 2022-08-09 18:32:54 +03:00
kd-11 e179adc4a0 rsx: Refactor surface cache storage 2022-08-09 18:32:54 +03:00
kd-11 61a055a1c6 Tuning 2022-08-07 22:14:49 +03:00
kd-11 64b4cfa59f rsx: Erase surface background when reloading after a pitch mismatch 2022-08-07 22:14:49 +03:00
kd-11 c799ffd223 rsx: Stubs for pitch conversion 2022-08-07 22:14:49 +03:00
Nekotekina 4b787b22c8 Implement FN (lambda shortener)
Useful for some higher order functions.
Allows to make short lambdas even shorter.
2022-07-08 14:47:41 +03:00
kd-11 5cafaef0a9 Aarch64 fixes for RSX 2022-07-04 22:35:05 +03:00
Jeff Guo cefc37a553
PPU LLVM arm64+macOS port (#12115)
* BufferUtils: use naive function pointer on Apple arm64

Use naive function pointer on Apple arm64 because ASLR breaks asmjit.
See BufferUtils.cpp comment for explanation on why this happens and how
to fix if you want to use asmjit.

* build-macos: fix source maps for Mac

Tell Qt not to strip debug symbols when we're in debug or relwithdebinfo
modes.

* LLVM PPU: fix aarch64 on macOS

Force MachO on macOS to fix LLVM being unable to patch relocations
during codegen. Adds Aarch64 NEON intrinsics for x86 intrinsics used by
PPUTranslator/Recompiler.

* virtual memory: use 16k pages on aarch64 macOS

Temporary hack to get things working by using 16k pages instead of 4k
pages in VM emulation.

* PPU/SPU: fix NEON intrinsics and compilation for arm64 macOS

Fixes some intrinsics usage and patches usages of asmjit to properly
emit absolute jmps so ASLR doesn't cause out of bounds rel jumps. Also
patches the SPU recompiler to properly work on arm64 by telling LLVM to
target arm64.

* virtual memory: fix W^X toggles on macOS aarch64

Fixes W^X on macOS aarch64 by setting all JIT mmap'd regions to default
to RW mode. For both SPU and PPU execution threads, when initialization
finishes we toggle to RX mode. This exploits Apple's per-thread setting
for RW/RX to let us be technically compliant with the OS's W^X
    enforcement while not needing to actually separate the memory
    allocated for code/data.

* PPU: implement aarch64 specific functions

Implements ppu_gateway for arm64 and patches LLVM initialization to use
the correct triple. Adds some fixes for macOS W^X JIT restrictions when
entering/exiting JITed code.

* PPU: Mark rpcs3 calls as non-tail

Strictly speaking, rpcs3 JIT -> C++ calls are not tail calls. If you
call a function inside e.g. an L2 syscall, it will clobber LR on arm64
and subtly break returns in emulated code. Only JIT -> JIT "calls"
should be tail.

* macOS/arm64: compatibility fixes

* vm: patch virtual memory for arm64 macOS

Tag mmap calls with MAP_JIT to allow W^X on macOS. Fix mmap calls to
existing mmap'd addresses that were tagged with MAP_JIT on macOS. Fix
memory unmapping on 16K page machines with a hack to mark "unmapped"
pages as RW.

* PPU: remove wrong comment

* PPU: fix a merge regression

* vm: remove 16k page hacks

* PPU: formatting fixes

* PPU: fix arm64 null function assembly

* ppu: clean up arch-specific instructions
2022-06-14 15:28:38 +03:00
Malcolm Jestadt 0d022d420b RSX: Add more wide paths for upload_untouched
- Adds AVX512 path for upload_untouched u16 with primitive restart, and
  AVX2 and AVX512 paths for upload_untouched without restart
- The AVX512 paths handle the remainder in simd code with masking, which
  provided a large speedup
- On my i5-1135G7 in demons souls benchmarking a scene in boletaria with
  a lot of geometry on screen via perf:
SSE4_1                      0.64%
AVX2                        0.59%
AVX512                      0.56%
AVX512 w/ remainder masking 0.51%
2022-06-12 06:23:55 +03:00
kd-11 60a2a39e88 gl: Deswizzle textures on the GPU 2022-05-31 23:34:14 +03:00
kd-11 7ec481d99b rsx: Allocate scratch memory using simple array with no default initialize
- This cuts down processing time significantly by eliminating calls to memset_stosb
2022-05-31 23:34:14 +03:00
kd-11 ec2d529832 rsx: Separate loop interrupts from graphics state
- The interrupts are for multithreaded signals andmake the main loop run more aggressively for the next cycle
2022-05-20 16:29:27 +03:00
kd-11 0244c4046e rsx: Lower performance hit due to frequency fetch 2022-05-20 16:29:27 +03:00
Nekotekina a2bfd5fcfc Minor AArch64 support changes 2022-05-04 16:12:32 +03:00
RipleyTom 8316469cfc Update libusb to v1.0.26 2022-04-29 02:04:52 +02:00
kd-11 e53bbd668b rsx: Fix surface cache scanning and removal 2022-04-05 14:07:05 +03:00
kd-11 4a86638ce8 rsx: Avoid unnecessary memprotect syscalls 2022-03-29 12:35:32 +03:00
kd-11 94a7e52c1f rsx: Disable ref count on exit 2022-03-28 19:55:34 +03:00
kd-11 2b42895bc7 rsx: Reduce log spam a bit 2022-03-28 19:55:34 +03:00
kd-11 d98d152d23 rsx: Fix leaking surface cache refs from texture cache
- Lock surfaces in use by texture cache to prevent complete deletion
- Remove discarded surfaces from the reprotect cache to avoid uaf
2022-03-28 19:55:34 +03:00
kd-11 9a2d4fe46b rsx: Relocatable transform constants 2022-03-26 16:10:18 +03:00
RipleyTom a4d715e25d Warning Fixes 2022-03-23 19:35:10 +01:00
kd-11 1ab5b481ff Fix ambiguous comparison operator warning 2022-03-23 11:26:06 +03:00
kd-11 26ee1246ae rsx: Block size back down to 4MB
- 4M is a good compromise, a 720p surface occupies just under 4MB
2022-03-23 11:26:06 +03:00
kd-11 d0402332f7 rsx: Bump surface cache block size to 16M 2022-03-23 11:26:06 +03:00
kd-11 43c7417906 rsx: Rework ranged map
- Adds metadata lookup for intersecting range calculations
- Make fetch/put methods more explicit
2022-03-23 11:26:06 +03:00
kd-11 56540a55ec Fix linux 2022-03-23 11:26:06 +03:00
kd-11 35ec4de776 rsx: Optimize surface store for faster scanning 2022-03-23 11:26:06 +03:00