Commit graph

930 commits

Author SHA1 Message Date
Nekotekina 4b787b22c8 Implement FN (lambda shortener)
Useful for some higher order functions.
Allows to make short lambdas even shorter.
2022-07-08 14:47:41 +03:00
kd-11 5cafaef0a9 Aarch64 fixes for RSX 2022-07-04 22:35:05 +03:00
Jeff Guo cefc37a553
PPU LLVM arm64+macOS port (#12115)
* BufferUtils: use naive function pointer on Apple arm64

Use naive function pointer on Apple arm64 because ASLR breaks asmjit.
See BufferUtils.cpp comment for explanation on why this happens and how
to fix if you want to use asmjit.

* build-macos: fix source maps for Mac

Tell Qt not to strip debug symbols when we're in debug or relwithdebinfo
modes.

* LLVM PPU: fix aarch64 on macOS

Force MachO on macOS to fix LLVM being unable to patch relocations
during codegen. Adds Aarch64 NEON intrinsics for x86 intrinsics used by
PPUTranslator/Recompiler.

* virtual memory: use 16k pages on aarch64 macOS

Temporary hack to get things working by using 16k pages instead of 4k
pages in VM emulation.

* PPU/SPU: fix NEON intrinsics and compilation for arm64 macOS

Fixes some intrinsics usage and patches usages of asmjit to properly
emit absolute jmps so ASLR doesn't cause out of bounds rel jumps. Also
patches the SPU recompiler to properly work on arm64 by telling LLVM to
target arm64.

* virtual memory: fix W^X toggles on macOS aarch64

Fixes W^X on macOS aarch64 by setting all JIT mmap'd regions to default
to RW mode. For both SPU and PPU execution threads, when initialization
finishes we toggle to RX mode. This exploits Apple's per-thread setting
for RW/RX to let us be technically compliant with the OS's W^X
    enforcement while not needing to actually separate the memory
    allocated for code/data.

* PPU: implement aarch64 specific functions

Implements ppu_gateway for arm64 and patches LLVM initialization to use
the correct triple. Adds some fixes for macOS W^X JIT restrictions when
entering/exiting JITed code.

* PPU: Mark rpcs3 calls as non-tail

Strictly speaking, rpcs3 JIT -> C++ calls are not tail calls. If you
call a function inside e.g. an L2 syscall, it will clobber LR on arm64
and subtly break returns in emulated code. Only JIT -> JIT "calls"
should be tail.

* macOS/arm64: compatibility fixes

* vm: patch virtual memory for arm64 macOS

Tag mmap calls with MAP_JIT to allow W^X on macOS. Fix mmap calls to
existing mmap'd addresses that were tagged with MAP_JIT on macOS. Fix
memory unmapping on 16K page machines with a hack to mark "unmapped"
pages as RW.

* PPU: remove wrong comment

* PPU: fix a merge regression

* vm: remove 16k page hacks

* PPU: formatting fixes

* PPU: fix arm64 null function assembly

* ppu: clean up arch-specific instructions
2022-06-14 15:28:38 +03:00
Malcolm Jestadt 0d022d420b RSX: Add more wide paths for upload_untouched
- Adds AVX512 path for upload_untouched u16 with primitive restart, and
  AVX2 and AVX512 paths for upload_untouched without restart
- The AVX512 paths handle the remainder in simd code with masking, which
  provided a large speedup
- On my i5-1135G7 in demons souls benchmarking a scene in boletaria with
  a lot of geometry on screen via perf:
SSE4_1                      0.64%
AVX2                        0.59%
AVX512                      0.56%
AVX512 w/ remainder masking 0.51%
2022-06-12 06:23:55 +03:00
kd-11 60a2a39e88 gl: Deswizzle textures on the GPU 2022-05-31 23:34:14 +03:00
kd-11 7ec481d99b rsx: Allocate scratch memory using simple array with no default initialize
- This cuts down processing time significantly by eliminating calls to memset_stosb
2022-05-31 23:34:14 +03:00
kd-11 ec2d529832 rsx: Separate loop interrupts from graphics state
- The interrupts are for multithreaded signals andmake the main loop run more aggressively for the next cycle
2022-05-20 16:29:27 +03:00
kd-11 0244c4046e rsx: Lower performance hit due to frequency fetch 2022-05-20 16:29:27 +03:00
Nekotekina a2bfd5fcfc Minor AArch64 support changes 2022-05-04 16:12:32 +03:00
RipleyTom 8316469cfc Update libusb to v1.0.26 2022-04-29 02:04:52 +02:00
kd-11 e53bbd668b rsx: Fix surface cache scanning and removal 2022-04-05 14:07:05 +03:00
kd-11 4a86638ce8 rsx: Avoid unnecessary memprotect syscalls 2022-03-29 12:35:32 +03:00
kd-11 94a7e52c1f rsx: Disable ref count on exit 2022-03-28 19:55:34 +03:00
kd-11 2b42895bc7 rsx: Reduce log spam a bit 2022-03-28 19:55:34 +03:00
kd-11 d98d152d23 rsx: Fix leaking surface cache refs from texture cache
- Lock surfaces in use by texture cache to prevent complete deletion
- Remove discarded surfaces from the reprotect cache to avoid uaf
2022-03-28 19:55:34 +03:00
kd-11 9a2d4fe46b rsx: Relocatable transform constants 2022-03-26 16:10:18 +03:00
RipleyTom a4d715e25d Warning Fixes 2022-03-23 19:35:10 +01:00
kd-11 1ab5b481ff Fix ambiguous comparison operator warning 2022-03-23 11:26:06 +03:00
kd-11 26ee1246ae rsx: Block size back down to 4MB
- 4M is a good compromise, a 720p surface occupies just under 4MB
2022-03-23 11:26:06 +03:00
kd-11 d0402332f7 rsx: Bump surface cache block size to 16M 2022-03-23 11:26:06 +03:00
kd-11 43c7417906 rsx: Rework ranged map
- Adds metadata lookup for intersecting range calculations
- Make fetch/put methods more explicit
2022-03-23 11:26:06 +03:00
kd-11 56540a55ec Fix linux 2022-03-23 11:26:06 +03:00
kd-11 35ec4de776 rsx: Optimize surface store for faster scanning 2022-03-23 11:26:06 +03:00
kd-11 bc7ed8eaab rsx/vk: Rework MSAA implementation 2022-03-17 22:02:20 +03:00
kd-11 78b8bd80e4 rsx: Unconditionally set MSAA flags if MSAA is active 2022-03-11 01:15:13 +03:00
kd-11 1943d9819f rsx: Clean up surface cache routines around RTT invalidate 2022-03-10 20:43:58 +03:00
kd-11 59a0cf94ab rsx: Fix msvc build 2022-03-08 22:06:26 +03:00
kd-11 3e4faf602a rsx: Fix clang build 2022-03-08 22:06:26 +03:00
kd-11 454a724f4e rsx: Reduce the performance impact of enabling the profiling timer
- Just use TSC if available
2022-03-08 22:06:26 +03:00
kd-11 cfecbb24ca rsx: Avoid calling slow functions every draw call
- Use TSC for timing where interval duration matters.
- Use atomic counter for ordering timestamps otherwise.
2022-03-08 22:06:26 +03:00
kd-11 762b594927 rsx: Fully process texture if surface cache configuration changed 2022-03-08 22:06:26 +03:00
kd-11 8d3d290e33 rsx: Fix build 2022-03-08 22:06:26 +03:00
kd-11 0df903090d rsx: Optimize metrics a bit
- For some reason this has a massive impact on performance above some arbitrary threshold of calls
  Shows up under surface_cache::get_merged_memory_region when doing gathers.
2022-03-08 22:06:26 +03:00
kd-11 6812fa4764 rsx: Fix surface write coherency when MSAA is active 2022-03-08 22:06:26 +03:00
kd-11 00a1864a95 Revert "rsx: Downgrade depth-1 3D images to 2D (#11593)"
This reverts commit 6c096b72b5.
2022-03-01 21:51:55 +03:00
kd-11 6c096b72b5
rsx: Downgrade depth-1 3D images to 2D (#11593)
- Fixes problems with implicit view types derived from dimensions.
2022-03-01 10:45:50 +03:00
kd-11 ec3e8de780 rsx: End the current frame before performing cache cleanup to release in-flight data 2022-02-10 22:20:56 +03:00
kd-11 2d9f21a2ea rsx: Lower performance warnings to 'warn' level instead of 'error' level to avoid causing panic for users 2022-02-07 09:25:01 +03:00
kd-11 247759b75b rsx: Fix memory tagging and add some security checks 2022-02-07 09:25:01 +03:00
kd-11 dca3d477c9 vk: Use image hot-cache for faster allocation times
- Creating new images is expensive.
- We can keep around a set of images that have been recently discarded and use them instead of creating new ones from scratch each time.
2022-02-06 15:49:50 +03:00
kd-11 86919ec0e1 rsx: Validate requested images before attempting to upload them
- Do not allow dimensions of 0 to reach the backend APIs
2022-01-30 14:58:51 +03:00
kd-11 3fa45ff994 Fix missing typeless info update 2022-01-26 12:08:36 +03:00
Nekotekina 0db9850a73 Add loop building utilities for ASMJIT
Refactor copy_data_swap_u32 a bit
2022-01-25 03:16:37 +03:00
Nekotekina 12c83b340d Remove built_function
With today's branch prediction techniques, it's hardly useful.
2022-01-24 22:21:41 +03:00
kd-11 2f7d38bb81 rsx: Improve coverage checking logic to handle 3D and cubemap resources 2022-01-23 00:03:03 +03:00
kd-11 4f8b5849b7 rsx: Take depth into account when calculating coverage 2022-01-23 00:03:03 +03:00
kd-11 7f216f2581 rsx: Fix local slice height calculation 2022-01-23 00:03:03 +03:00
nastys 801e7f3c2f macOS: Implement texture swizzling for 16-bit formats 2022-01-22 00:17:17 +01:00
Nekotekina 4704367382 Remove unnecessary asmjit::imm_ptr 2022-01-18 00:10:32 +03:00
Nekotekina 14cca55b50 PPU: refactor vector rounding instructions
Fix: nearbyint -> roundeven
2022-01-18 00:10:32 +03:00