Commit graph

128 commits

Author SHA1 Message Date
Nekotekina b49a1f27eb Warning fixes 2022-09-17 16:35:02 +03:00
Nekotekina 5985f0eefa BufferUtils: cleanup regarding ARM64 2022-09-07 17:59:07 +03:00
Nekotekina 82258915da BufferUtils: rewrite remaining intrinsic code with simd_builder 2022-09-07 17:59:07 +03:00
Nekotekina 11a1f090d3 BufferUtils: simd_builder refactoring
Some simplifications implemented.
2022-09-07 17:59:07 +03:00
Elad Ashkenazi 290226539f
Fix ARM build (#12606) 2022-09-04 21:11:04 +03:00
Nekotekina 58e3232710 BufferUtils: Fix regression in upload_untouched 2022-09-01 17:39:04 +03:00
Nekotekina e28707055b Implement simd_builder for x86
ASMJIT-based tool for building vectorized loops (such as ones in BufferUtils.cpp)
2022-08-28 18:38:52 +03:00
Jeff Guo cefc37a553
PPU LLVM arm64+macOS port (#12115)
* BufferUtils: use naive function pointer on Apple arm64

Use naive function pointer on Apple arm64 because ASLR breaks asmjit.
See BufferUtils.cpp comment for explanation on why this happens and how
to fix if you want to use asmjit.

* build-macos: fix source maps for Mac

Tell Qt not to strip debug symbols when we're in debug or relwithdebinfo
modes.

* LLVM PPU: fix aarch64 on macOS

Force MachO on macOS to fix LLVM being unable to patch relocations
during codegen. Adds Aarch64 NEON intrinsics for x86 intrinsics used by
PPUTranslator/Recompiler.

* virtual memory: use 16k pages on aarch64 macOS

Temporary hack to get things working by using 16k pages instead of 4k
pages in VM emulation.

* PPU/SPU: fix NEON intrinsics and compilation for arm64 macOS

Fixes some intrinsics usage and patches usages of asmjit to properly
emit absolute jmps so ASLR doesn't cause out of bounds rel jumps. Also
patches the SPU recompiler to properly work on arm64 by telling LLVM to
target arm64.

* virtual memory: fix W^X toggles on macOS aarch64

Fixes W^X on macOS aarch64 by setting all JIT mmap'd regions to default
to RW mode. For both SPU and PPU execution threads, when initialization
finishes we toggle to RX mode. This exploits Apple's per-thread setting
for RW/RX to let us be technically compliant with the OS's W^X
    enforcement while not needing to actually separate the memory
    allocated for code/data.

* PPU: implement aarch64 specific functions

Implements ppu_gateway for arm64 and patches LLVM initialization to use
the correct triple. Adds some fixes for macOS W^X JIT restrictions when
entering/exiting JITed code.

* PPU: Mark rpcs3 calls as non-tail

Strictly speaking, rpcs3 JIT -> C++ calls are not tail calls. If you
call a function inside e.g. an L2 syscall, it will clobber LR on arm64
and subtly break returns in emulated code. Only JIT -> JIT "calls"
should be tail.

* macOS/arm64: compatibility fixes

* vm: patch virtual memory for arm64 macOS

Tag mmap calls with MAP_JIT to allow W^X on macOS. Fix mmap calls to
existing mmap'd addresses that were tagged with MAP_JIT on macOS. Fix
memory unmapping on 16K page machines with a hack to mark "unmapped"
pages as RW.

* PPU: remove wrong comment

* PPU: fix a merge regression

* vm: remove 16k page hacks

* PPU: formatting fixes

* PPU: fix arm64 null function assembly

* ppu: clean up arch-specific instructions
2022-06-14 15:28:38 +03:00
Malcolm Jestadt 0d022d420b RSX: Add more wide paths for upload_untouched
- Adds AVX512 path for upload_untouched u16 with primitive restart, and
  AVX2 and AVX512 paths for upload_untouched without restart
- The AVX512 paths handle the remainder in simd code with masking, which
  provided a large speedup
- On my i5-1135G7 in demons souls benchmarking a scene in boletaria with
  a lot of geometry on screen via perf:
SSE4_1                      0.64%
AVX2                        0.59%
AVX512                      0.56%
AVX512 w/ remainder masking 0.51%
2022-06-12 06:23:55 +03:00
kd-11 9a2d4fe46b rsx: Relocatable transform constants 2022-03-26 16:10:18 +03:00
RipleyTom a4d715e25d Warning Fixes 2022-03-23 19:35:10 +01:00
Nekotekina 0db9850a73 Add loop building utilities for ASMJIT
Refactor copy_data_swap_u32 a bit
2022-01-25 03:16:37 +03:00
Nekotekina 12c83b340d Remove built_function
With today's branch prediction techniques, it's hardly useful.
2022-01-24 22:21:41 +03:00
Nekotekina 4704367382 Remove unnecessary asmjit::imm_ptr 2022-01-18 00:10:32 +03:00
Nekotekina 14cca55b50 PPU: refactor vector rounding instructions
Fix: nearbyint -> roundeven
2022-01-18 00:10:32 +03:00
Nekotekina 580bd2b25e Initial Linux Aarch64 support
* Update asmjit dependency (aarch64 branch)
* Disable USE_DISCORD_RPC by default
* Dump some JIT objects in rpcs3 cache dir
* Add SIGILL handler for all platforms
* Fix resetting zeroing denormals in thread pool
* Refactor most v128:: utils into global gv_** functions
* Refactor PPU interpreter (incomplete), remove "precise"
* - Instruction specializations with multiple accuracy flags
* - Adjust calling convention for speed
* - Removed precise/fast setting, replaced with static
* - Started refactoring interpreters for building at runtime JIT
*   (I got tired of poor compiler optimizations)
* - Expose some accuracy settings (SAT, NJ, VNAN, FPCC)
* - Add exec_bytes PPU thread variable (akin to cycle count)
* PPU LLVM: fix VCTUXS+VCTSXS instruction NaN results
* SPU interpreter: remove "precise" for now (extremely non-portable)
* - As with PPU, settings changed to static/dynamic for interpreters.
* - Precise options will be implemented later
* Fix termination after fatal error dialog
2022-01-15 06:48:04 +03:00
Nekotekina cb2748ae08 Update ASMJIT (new upstream API) 2021-12-29 02:45:00 +03:00
Nekotekina d836033212 LLVM: enable some JIT events (Intel, Perf)
Made some related adjustments.
Currently incomplete.
2021-12-26 16:41:37 +03:00
Nekotekina 8b4b6ba946 copy_data_swap_u32: build AVX-512 path 2021-12-26 14:40:21 +03:00
Nekotekina 599e00d6da BufferUtils: remove dead code (vertex streaming)
RIP. It won't be useful.
2021-12-26 14:40:21 +03:00
Nekotekina 3cd8891ab8 Re-refactor copy_data_swap_u32 again
Drop AVX2 path for now, since it usually operates on small data.
Rely on automatic SSE vectorization on recent compilers.
Side refactoring on JIT.h to workaround weird conflict issue.
2021-12-26 14:40:21 +03:00
Nekotekina 262ff01619 Use aligned stores in write_index_array_data_to_buffer
Ensure that target buffer is cache line aligned.
Improve stx::make_single to support alignment.
2021-12-21 23:28:09 +03:00
Nekotekina 76ccaf5e6f BufferUtils: refactoring
Optimize CPU capability tests for arch-tuned builds.
Separate streaming and non-streaming utilities.
Rewritten copy_data_swap_u32(_cmp) with AVX2 path.
2021-12-21 23:28:09 +03:00
Nekotekina a1608b636f span: implement as_span workarounds as utils::bless
Minor cleanup.
2021-05-31 15:46:34 +03:00
Ani a49446c9e9
Replace gsl::span for std::span (c++20) (#7531)
* Replace gsl::span for std::span (c++20)
* Replace gsl::byte with std::byte

Co-authored-by: Bevan Weiss <bevan.weiss@gmail.com>
2021-05-30 17:10:46 +03:00
Nekotekina c22e1e71f0 Continue fixing strict aliasing warnings 2021-03-13 18:02:37 +03:00
Nekotekina 9cbe77904d Revert changes in BufferUtils.cpp
Should fix #9933
2021-03-09 19:19:24 +03:00
Nekotekina a4fdbf0a88 Enable -Wstrict-aliasing=1 (GCC)
Fixed partially.
2021-03-09 03:10:15 +03:00
Nekotekina 8e6e57de86 Enable -Wunused-function warning 2021-02-15 14:39:53 +03:00
Nekotekina 0af452720e RSX: Fix possible bug in memory streaming utils 2021-01-12 15:06:31 +03:00
Nekotekina db8e6fe7a7 Enable -Wunused-variable 2021-01-12 14:34:14 +03:00
Nekotekina bd269bccaf types.hpp: remove intrinsic includes
Replace v128 with u128 in some places.
Removed some unused files.
2020-12-21 21:11:25 +03:00
Nekotekina db9b7db531 Cleanup and move sysinfo.h -> util/sysinfo.hpp 2020-12-18 12:55:54 +03:00
Nekotekina e321765c54 Split BEType.h to util/v128.hpp and util/to_endian.hpp 2020-12-13 16:34:45 +03:00
Nekotekina 6e05dcadb6 Reduce std::numeric_limits dependency
Please, stop pretending...
You need these templates for generic code.
In other words, in another templates.
Stop increasing compilation time for no reason.
2020-12-12 12:35:18 +03:00
Nekotekina 36c8654fb8 Remove HERE macro
Some cleanup.
Add location to some functions.
2020-12-10 12:30:22 +03:00
Nekotekina 5d934c8759 Improve narrow() and size32() with src_loc detection 2020-12-09 16:26:20 +03:00
Nekotekina e055d16b2c Replace verify() with ensure() with auto src location.
Expression ensure(x) returns x.
Using comma operator removed.
2020-12-09 15:43:38 +03:00
RipleyTom af8c661a64 Remove BOM markers 2020-12-06 15:30:12 +03:00
Eladash 36fd1d0f0d
rsx: Optimize transform constants load methods (#7992) 2020-04-09 15:53:43 +03:00
Eladash d97e9f7b4a rsx: Batch vertex program load methods 2020-04-02 20:42:12 +03:00
Nekotekina Aux1 250736ece5 Fix warnings in emucore 2020-03-04 21:23:34 +03:00
Nekotekina 5e75a0c497 Disable cotire on travis
Make some workarounds for clang because it poorly supports -Wold-style-cast
2020-02-21 17:03:54 +03:00
Nekotekina c0f80cfe7a Use attributes for LIKELY/UNLIKELY
Remove LIKELY/UNLIKELY macro.
2020-02-05 10:42:34 +03:00
Nekotekina 185c067d5b C-style cast cleanup V 2019-12-03 17:23:00 +03:00
kd-11 8ca53f9c84 rsx: Remember to min-max the anchor indices of a polygon or triangle fan 2019-11-24 19:01:57 +03:00
Emmanuel Gil Peyrot f76720ceb0 Remove extraneous ::narrow<int>() calls
GSL’s gsl::span didn’t use the correct type for its index_type, which is
why they were needed.
2019-11-09 19:30:06 +01:00
Emmanuel Gil Peyrot ef368c5171 rsx: Replace gsl::byte with C++17’s std::byte 2019-11-09 19:30:05 +01:00
Nekotekina e3e7051ed3 Minor optimization in BufferUtils.cpp
Don't use PSHUFB for horizontal operations.
Utilize PHMINPOSUW to compute max as well:
 + sse41_hmin_epu16
 + sse41_hmax_epu16
2019-10-30 18:52:34 +03:00
Nekotekina b1968769b7 Minor cleanup in BufferUtils.cpp
Replace inline asm with intrinsic using target attribute trick.
2019-10-30 17:53:51 +03:00