Commit graph

1003 commits

Author SHA1 Message Date
kd-11 f66eaf8f44 rsx: Add some handy util functions to simple_array 2022-10-03 12:57:16 +03:00
kd-11 5281a85b67 rsx: Fix compiler warnings 2022-09-28 12:55:31 +03:00
Nekotekina 6ff6a4989a Implement at32() util
Works like .at() but uses source location for "exception".
2022-09-26 18:04:15 +03:00
kd-11 dd8a337b14 rsx: Fix some more warnings 2022-09-22 23:46:48 +03:00
kd-11 61666bae69 rsx: Fix hardware deswizzle not getting used when hardware deswizzle flag is not set 2022-09-22 23:46:48 +03:00
Nekotekina c4db65cc08 Fix one more warning 2022-09-18 18:35:17 +03:00
Nekotekina b49a1f27eb Warning fixes 2022-09-17 16:35:02 +03:00
Nekotekina 5985f0eefa BufferUtils: cleanup regarding ARM64 2022-09-07 17:59:07 +03:00
Nekotekina 82258915da BufferUtils: rewrite remaining intrinsic code with simd_builder 2022-09-07 17:59:07 +03:00
Nekotekina 11a1f090d3 BufferUtils: simd_builder refactoring
Some simplifications implemented.
2022-09-07 17:59:07 +03:00
Elad Ashkenazi 290226539f
Fix ARM build (#12606) 2022-09-04 21:11:04 +03:00
Nekotekina 58e3232710 BufferUtils: Fix regression in upload_untouched 2022-09-01 17:39:04 +03:00
Nekotekina e28707055b Implement simd_builder for x86
ASMJIT-based tool for building vectorized loops (such as ones in BufferUtils.cpp)
2022-08-28 18:38:52 +03:00
kd-11 1fc0191311 Fix build 2022-08-23 23:49:46 +03:00
kd-11 1f9e04f72d rsx/vk: Implement flushing surface cache blocks to linear mem 2022-08-23 23:49:46 +03:00
kd-11 bca833dad7 Fix surface reuse 2022-08-20 01:23:15 +03:00
kd-11 2e504b2dac rsx: Silence some warnings 2022-08-19 14:29:20 +03:00
kd-11 bacf518189 rsx: Fix 2D intersection tests 2022-08-14 23:53:50 +03:00
kd-11 c51d3b5465 Workaround for msvc weirdness 2022-08-09 18:32:54 +03:00
kd-11 e179adc4a0 rsx: Refactor surface cache storage 2022-08-09 18:32:54 +03:00
kd-11 61a055a1c6 Tuning 2022-08-07 22:14:49 +03:00
kd-11 64b4cfa59f rsx: Erase surface background when reloading after a pitch mismatch 2022-08-07 22:14:49 +03:00
kd-11 c799ffd223 rsx: Stubs for pitch conversion 2022-08-07 22:14:49 +03:00
Nekotekina 4b787b22c8 Implement FN (lambda shortener)
Useful for some higher order functions.
Allows to make short lambdas even shorter.
2022-07-08 14:47:41 +03:00
kd-11 5cafaef0a9 Aarch64 fixes for RSX 2022-07-04 22:35:05 +03:00
Jeff Guo cefc37a553
PPU LLVM arm64+macOS port (#12115)
* BufferUtils: use naive function pointer on Apple arm64

Use naive function pointer on Apple arm64 because ASLR breaks asmjit.
See BufferUtils.cpp comment for explanation on why this happens and how
to fix if you want to use asmjit.

* build-macos: fix source maps for Mac

Tell Qt not to strip debug symbols when we're in debug or relwithdebinfo
modes.

* LLVM PPU: fix aarch64 on macOS

Force MachO on macOS to fix LLVM being unable to patch relocations
during codegen. Adds Aarch64 NEON intrinsics for x86 intrinsics used by
PPUTranslator/Recompiler.

* virtual memory: use 16k pages on aarch64 macOS

Temporary hack to get things working by using 16k pages instead of 4k
pages in VM emulation.

* PPU/SPU: fix NEON intrinsics and compilation for arm64 macOS

Fixes some intrinsics usage and patches usages of asmjit to properly
emit absolute jmps so ASLR doesn't cause out of bounds rel jumps. Also
patches the SPU recompiler to properly work on arm64 by telling LLVM to
target arm64.

* virtual memory: fix W^X toggles on macOS aarch64

Fixes W^X on macOS aarch64 by setting all JIT mmap'd regions to default
to RW mode. For both SPU and PPU execution threads, when initialization
finishes we toggle to RX mode. This exploits Apple's per-thread setting
for RW/RX to let us be technically compliant with the OS's W^X
    enforcement while not needing to actually separate the memory
    allocated for code/data.

* PPU: implement aarch64 specific functions

Implements ppu_gateway for arm64 and patches LLVM initialization to use
the correct triple. Adds some fixes for macOS W^X JIT restrictions when
entering/exiting JITed code.

* PPU: Mark rpcs3 calls as non-tail

Strictly speaking, rpcs3 JIT -> C++ calls are not tail calls. If you
call a function inside e.g. an L2 syscall, it will clobber LR on arm64
and subtly break returns in emulated code. Only JIT -> JIT "calls"
should be tail.

* macOS/arm64: compatibility fixes

* vm: patch virtual memory for arm64 macOS

Tag mmap calls with MAP_JIT to allow W^X on macOS. Fix mmap calls to
existing mmap'd addresses that were tagged with MAP_JIT on macOS. Fix
memory unmapping on 16K page machines with a hack to mark "unmapped"
pages as RW.

* PPU: remove wrong comment

* PPU: fix a merge regression

* vm: remove 16k page hacks

* PPU: formatting fixes

* PPU: fix arm64 null function assembly

* ppu: clean up arch-specific instructions
2022-06-14 15:28:38 +03:00
Malcolm Jestadt 0d022d420b RSX: Add more wide paths for upload_untouched
- Adds AVX512 path for upload_untouched u16 with primitive restart, and
  AVX2 and AVX512 paths for upload_untouched without restart
- The AVX512 paths handle the remainder in simd code with masking, which
  provided a large speedup
- On my i5-1135G7 in demons souls benchmarking a scene in boletaria with
  a lot of geometry on screen via perf:
SSE4_1                      0.64%
AVX2                        0.59%
AVX512                      0.56%
AVX512 w/ remainder masking 0.51%
2022-06-12 06:23:55 +03:00
kd-11 60a2a39e88 gl: Deswizzle textures on the GPU 2022-05-31 23:34:14 +03:00
kd-11 7ec481d99b rsx: Allocate scratch memory using simple array with no default initialize
- This cuts down processing time significantly by eliminating calls to memset_stosb
2022-05-31 23:34:14 +03:00
kd-11 ec2d529832 rsx: Separate loop interrupts from graphics state
- The interrupts are for multithreaded signals andmake the main loop run more aggressively for the next cycle
2022-05-20 16:29:27 +03:00
kd-11 0244c4046e rsx: Lower performance hit due to frequency fetch 2022-05-20 16:29:27 +03:00
Nekotekina a2bfd5fcfc Minor AArch64 support changes 2022-05-04 16:12:32 +03:00
RipleyTom 8316469cfc Update libusb to v1.0.26 2022-04-29 02:04:52 +02:00
kd-11 e53bbd668b rsx: Fix surface cache scanning and removal 2022-04-05 14:07:05 +03:00
kd-11 4a86638ce8 rsx: Avoid unnecessary memprotect syscalls 2022-03-29 12:35:32 +03:00
kd-11 94a7e52c1f rsx: Disable ref count on exit 2022-03-28 19:55:34 +03:00
kd-11 2b42895bc7 rsx: Reduce log spam a bit 2022-03-28 19:55:34 +03:00
kd-11 d98d152d23 rsx: Fix leaking surface cache refs from texture cache
- Lock surfaces in use by texture cache to prevent complete deletion
- Remove discarded surfaces from the reprotect cache to avoid uaf
2022-03-28 19:55:34 +03:00
kd-11 9a2d4fe46b rsx: Relocatable transform constants 2022-03-26 16:10:18 +03:00
RipleyTom a4d715e25d Warning Fixes 2022-03-23 19:35:10 +01:00
kd-11 1ab5b481ff Fix ambiguous comparison operator warning 2022-03-23 11:26:06 +03:00
kd-11 26ee1246ae rsx: Block size back down to 4MB
- 4M is a good compromise, a 720p surface occupies just under 4MB
2022-03-23 11:26:06 +03:00
kd-11 d0402332f7 rsx: Bump surface cache block size to 16M 2022-03-23 11:26:06 +03:00
kd-11 43c7417906 rsx: Rework ranged map
- Adds metadata lookup for intersecting range calculations
- Make fetch/put methods more explicit
2022-03-23 11:26:06 +03:00
kd-11 56540a55ec Fix linux 2022-03-23 11:26:06 +03:00
kd-11 35ec4de776 rsx: Optimize surface store for faster scanning 2022-03-23 11:26:06 +03:00
kd-11 bc7ed8eaab rsx/vk: Rework MSAA implementation 2022-03-17 22:02:20 +03:00
kd-11 78b8bd80e4 rsx: Unconditionally set MSAA flags if MSAA is active 2022-03-11 01:15:13 +03:00
kd-11 1943d9819f rsx: Clean up surface cache routines around RTT invalidate 2022-03-10 20:43:58 +03:00
kd-11 59a0cf94ab rsx: Fix msvc build 2022-03-08 22:06:26 +03:00
kd-11 3e4faf602a rsx: Fix clang build 2022-03-08 22:06:26 +03:00
kd-11 454a724f4e rsx: Reduce the performance impact of enabling the profiling timer
- Just use TSC if available
2022-03-08 22:06:26 +03:00
kd-11 cfecbb24ca rsx: Avoid calling slow functions every draw call
- Use TSC for timing where interval duration matters.
- Use atomic counter for ordering timestamps otherwise.
2022-03-08 22:06:26 +03:00
kd-11 762b594927 rsx: Fully process texture if surface cache configuration changed 2022-03-08 22:06:26 +03:00
kd-11 8d3d290e33 rsx: Fix build 2022-03-08 22:06:26 +03:00
kd-11 0df903090d rsx: Optimize metrics a bit
- For some reason this has a massive impact on performance above some arbitrary threshold of calls
  Shows up under surface_cache::get_merged_memory_region when doing gathers.
2022-03-08 22:06:26 +03:00
kd-11 6812fa4764 rsx: Fix surface write coherency when MSAA is active 2022-03-08 22:06:26 +03:00
kd-11 00a1864a95 Revert "rsx: Downgrade depth-1 3D images to 2D (#11593)"
This reverts commit 6c096b72b5.
2022-03-01 21:51:55 +03:00
kd-11 6c096b72b5
rsx: Downgrade depth-1 3D images to 2D (#11593)
- Fixes problems with implicit view types derived from dimensions.
2022-03-01 10:45:50 +03:00
kd-11 ec3e8de780 rsx: End the current frame before performing cache cleanup to release in-flight data 2022-02-10 22:20:56 +03:00
kd-11 2d9f21a2ea rsx: Lower performance warnings to 'warn' level instead of 'error' level to avoid causing panic for users 2022-02-07 09:25:01 +03:00
kd-11 247759b75b rsx: Fix memory tagging and add some security checks 2022-02-07 09:25:01 +03:00
kd-11 dca3d477c9 vk: Use image hot-cache for faster allocation times
- Creating new images is expensive.
- We can keep around a set of images that have been recently discarded and use them instead of creating new ones from scratch each time.
2022-02-06 15:49:50 +03:00
kd-11 86919ec0e1 rsx: Validate requested images before attempting to upload them
- Do not allow dimensions of 0 to reach the backend APIs
2022-01-30 14:58:51 +03:00
kd-11 3fa45ff994 Fix missing typeless info update 2022-01-26 12:08:36 +03:00
Nekotekina 0db9850a73 Add loop building utilities for ASMJIT
Refactor copy_data_swap_u32 a bit
2022-01-25 03:16:37 +03:00
Nekotekina 12c83b340d Remove built_function
With today's branch prediction techniques, it's hardly useful.
2022-01-24 22:21:41 +03:00
kd-11 2f7d38bb81 rsx: Improve coverage checking logic to handle 3D and cubemap resources 2022-01-23 00:03:03 +03:00
kd-11 4f8b5849b7 rsx: Take depth into account when calculating coverage 2022-01-23 00:03:03 +03:00
kd-11 7f216f2581 rsx: Fix local slice height calculation 2022-01-23 00:03:03 +03:00
nastys 801e7f3c2f macOS: Implement texture swizzling for 16-bit formats 2022-01-22 00:17:17 +01:00
Nekotekina 4704367382 Remove unnecessary asmjit::imm_ptr 2022-01-18 00:10:32 +03:00
Nekotekina 14cca55b50 PPU: refactor vector rounding instructions
Fix: nearbyint -> roundeven
2022-01-18 00:10:32 +03:00
Nekotekina 580bd2b25e Initial Linux Aarch64 support
* Update asmjit dependency (aarch64 branch)
* Disable USE_DISCORD_RPC by default
* Dump some JIT objects in rpcs3 cache dir
* Add SIGILL handler for all platforms
* Fix resetting zeroing denormals in thread pool
* Refactor most v128:: utils into global gv_** functions
* Refactor PPU interpreter (incomplete), remove "precise"
* - Instruction specializations with multiple accuracy flags
* - Adjust calling convention for speed
* - Removed precise/fast setting, replaced with static
* - Started refactoring interpreters for building at runtime JIT
*   (I got tired of poor compiler optimizations)
* - Expose some accuracy settings (SAT, NJ, VNAN, FPCC)
* - Add exec_bytes PPU thread variable (akin to cycle count)
* PPU LLVM: fix VCTUXS+VCTSXS instruction NaN results
* SPU interpreter: remove "precise" for now (extremely non-portable)
* - As with PPU, settings changed to static/dynamic for interpreters.
* - Precise options will be implemented later
* Fix termination after fatal error dialog
2022-01-15 06:48:04 +03:00
kd-11 6d737e61fd rsx: Use 32 bit integers for pitch
- RSX max pitch = 65536 which requires 17 bits
2022-01-10 12:27:30 +03:00
kd-11 83026fd263 rsx: use coverage ratio to determine when too much data is overlapping 2022-01-07 22:55:27 +03:00
kd-11 92824b6729 rsx: Rework invalidation tagging 2022-01-07 22:55:27 +03:00
kd-11 7563655221 rsx: Bump surface removal threshold values
- It is much slower to attempt surface removal than to render duplicates on the host GPU
2022-01-07 22:55:27 +03:00
kd-11 6889b48973 rsx: Add optimized version of section removal code 2022-01-07 22:55:27 +03:00
Nekotekina cb2748ae08 Update ASMJIT (new upstream API) 2021-12-29 02:45:00 +03:00
Nekotekina d836033212 LLVM: enable some JIT events (Intel, Perf)
Made some related adjustments.
Currently incomplete.
2021-12-26 16:41:37 +03:00
Nekotekina 8b4b6ba946 copy_data_swap_u32: build AVX-512 path 2021-12-26 14:40:21 +03:00
Nekotekina 599e00d6da BufferUtils: remove dead code (vertex streaming)
RIP. It won't be useful.
2021-12-26 14:40:21 +03:00
Nekotekina 3cd8891ab8 Re-refactor copy_data_swap_u32 again
Drop AVX2 path for now, since it usually operates on small data.
Rely on automatic SSE vectorization on recent compilers.
Side refactoring on JIT.h to workaround weird conflict issue.
2021-12-26 14:40:21 +03:00
nastys a0040e6fb1
macOS: Implement texture converter for Metal (2) (#11289)
* macOS: Implement texture converter for Metal (2)

* Fix texture conversion formatting
2021-12-24 15:46:37 +03:00
kd-11 39ef39aa4e rsx: Exercise caution when testing for overlaps in invalidated sections 2021-12-24 15:13:33 +03:00
Nekotekina 262ff01619 Use aligned stores in write_index_array_data_to_buffer
Ensure that target buffer is cache line aligned.
Improve stx::make_single to support alignment.
2021-12-21 23:28:09 +03:00
Nekotekina 76ccaf5e6f BufferUtils: refactoring
Optimize CPU capability tests for arch-tuned builds.
Separate streaming and non-streaming utilities.
Rewritten copy_data_swap_u32(_cmp) with AVX2 path.
2021-12-21 23:28:09 +03:00
Nekotekina d6420b8803 Put std::hash specialization out of std 2021-12-07 13:04:10 +03:00
kd-11 44fe6f6d39 rsx: Fix sloppy format matching test 2021-11-27 17:47:41 +03:00
kd-11 4df1a938b1 Unused var 2021-11-24 16:02:24 +03:00
kd-11 94a3b1cfe8 rsx: Roll back some optimizations
- Just use RGB565 for all blit targets. Avoids really dumb transforms done by GPU hw.
- When X16 is used, all the channels get written to R channel alone. CmdBlit does perform format conversion!
- gl: Force image copy when blit is requested with compatible targets. Avoids format conversion issues.
2021-11-24 16:02:24 +03:00
kd-11 a21c6c4628 rsx: Fix handling of scaling requests for packed formats
- One does not simply interpolate RGB565 components as U16 data!
2021-11-24 16:02:24 +03:00
kd-11 97bd8f7bc1 rsx: Update sampler format class when inheriting mipmap slices/sections 2021-11-24 16:02:24 +03:00
Megamouse 4d0330bf82 rsx: fix possible segfault 2021-11-16 09:31:16 +01:00
kd-11 ad00c44231 rsx: Configure pitch correctly for pitch-zero textures (1D) 2021-11-03 16:58:30 +03:00
kd-11 5b0ef401f7 rsx: Fix sampling in X when 0 pitch is given
- A pitch of 0 still allows 1-dimensional addressing.
2021-10-31 14:32:42 +03:00
kd-11 78bcb0fd53 rsx: Do not reuse/destroy sections that have references held
- Avoids a situation where blit-dst and blit-src have overlapping ranges. Uploading blit-dst destroys blit-src and vice-versa.
  This is not the end of the world, but blit-src should be kept around until the operation is completed to avoid stale references!
2021-10-27 12:30:43 +03:00
kd-11 479150b214 rsx: Fix decoding of linear cubemaps
- 128-byte boundary is not observed in linear tiling. Verified in hw.
2021-10-10 16:15:28 +03:00
kd-11 dc8fc9fc79 vk: Clean up around vkQueueSubmit handling
- Explicitly declare one version for CB flush and the other for Async flush
- Always flush descriptors on CB flush in case of page fault handling.
  Other threads other than offloader can also enter the method and require normal flow.
- Fix overlapping interrupt IDs.
- Minor formatting fixes
2021-09-28 23:18:26 +03:00