Commit graph

3762 commits

Author SHA1 Message Date
Nekotekina 69912ba3c7 Partial revert for cf0fcf5a2a 2022-06-30 14:38:14 +03:00
Eladash cf0fcf5a2a SPU: Implement execution wake-up delay 2022-06-28 19:54:25 +03:00
Eladash f5a55b3024 rsx: Fixup after #12052 for frame limiter off 2022-06-25 17:39:07 +03:00
Eladash 7422ab9e55 rsx: Do not discard flip notifications 2022-06-25 15:30:41 +02:00
Eladash f66256cc13 rsx: PS3 Native frame limiter improvements, add Infinite frame limiter
* Do not wait on DEVICE 0x30 semaphore, it seems like it is something to do with queue command synchronization.
 - This also fixes cellGcmSetFlipWithWaitLabel which is built specifically to enable accurate RSX flipping time, its waiting command is confirmed to be placed **AFTER** DEVICE 0x30 waiting.
* Fix default vsync state to be enabled. (and set it to enabled in cellGcmSetVBlankFrequency as well)
* Add experimental "Infinite" frame limiter mode.
* Fix spurious enabling of second vblank.
2022-06-25 15:30:41 +02:00
Eladash 5e01ffdfd8 Debugger: Optimize cpu_thread::dump_regs()
Reuse string buffer. Copies and reallocations are expensive with such large strings.
2022-06-23 22:41:32 +02:00
Eladash 3899248305 RSX Debugger: Stable NOP skipping
Allow addresses of NOP blocks to remain consistent in between debugger position changes except for the first which can shrink or grow.
2022-06-21 16:59:45 +03:00
Jeff Guo cefc37a553
PPU LLVM arm64+macOS port (#12115)
* BufferUtils: use naive function pointer on Apple arm64

Use naive function pointer on Apple arm64 because ASLR breaks asmjit.
See BufferUtils.cpp comment for explanation on why this happens and how
to fix if you want to use asmjit.

* build-macos: fix source maps for Mac

Tell Qt not to strip debug symbols when we're in debug or relwithdebinfo
modes.

* LLVM PPU: fix aarch64 on macOS

Force MachO on macOS to fix LLVM being unable to patch relocations
during codegen. Adds Aarch64 NEON intrinsics for x86 intrinsics used by
PPUTranslator/Recompiler.

* virtual memory: use 16k pages on aarch64 macOS

Temporary hack to get things working by using 16k pages instead of 4k
pages in VM emulation.

* PPU/SPU: fix NEON intrinsics and compilation for arm64 macOS

Fixes some intrinsics usage and patches usages of asmjit to properly
emit absolute jmps so ASLR doesn't cause out of bounds rel jumps. Also
patches the SPU recompiler to properly work on arm64 by telling LLVM to
target arm64.

* virtual memory: fix W^X toggles on macOS aarch64

Fixes W^X on macOS aarch64 by setting all JIT mmap'd regions to default
to RW mode. For both SPU and PPU execution threads, when initialization
finishes we toggle to RX mode. This exploits Apple's per-thread setting
for RW/RX to let us be technically compliant with the OS's W^X
    enforcement while not needing to actually separate the memory
    allocated for code/data.

* PPU: implement aarch64 specific functions

Implements ppu_gateway for arm64 and patches LLVM initialization to use
the correct triple. Adds some fixes for macOS W^X JIT restrictions when
entering/exiting JITed code.

* PPU: Mark rpcs3 calls as non-tail

Strictly speaking, rpcs3 JIT -> C++ calls are not tail calls. If you
call a function inside e.g. an L2 syscall, it will clobber LR on arm64
and subtly break returns in emulated code. Only JIT -> JIT "calls"
should be tail.

* macOS/arm64: compatibility fixes

* vm: patch virtual memory for arm64 macOS

Tag mmap calls with MAP_JIT to allow W^X on macOS. Fix mmap calls to
existing mmap'd addresses that were tagged with MAP_JIT on macOS. Fix
memory unmapping on 16K page machines with a hack to mark "unmapped"
pages as RW.

* PPU: remove wrong comment

* PPU: fix a merge regression

* vm: remove 16k page hacks

* PPU: formatting fixes

* PPU: fix arm64 null function assembly

* ppu: clean up arch-specific instructions
2022-06-14 15:28:38 +03:00
Eladash 264253757c rsx: Improve Null Renderer 2022-06-12 20:54:42 +03:00
Ani 2512e958fa
glsl: Avoid implicit int->uint conversions (#12220) 2022-06-12 18:05:43 +01:00
Elad Ashkenazi 280aa6da91
rsx: Fix NV406E semaphore_acquire timeout detection (#12205) 2022-06-12 12:34:29 +03:00
Malcolm Jestadt 0d022d420b RSX: Add more wide paths for upload_untouched
- Adds AVX512 path for upload_untouched u16 with primitive restart, and
  AVX2 and AVX512 paths for upload_untouched without restart
- The AVX512 paths handle the remainder in simd code with masking, which
  provided a large speedup
- On my i5-1135G7 in demons souls benchmarking a scene in boletaria with
  a lot of geometry on screen via perf:
SSE4_1                      0.64%
AVX2                        0.59%
AVX512                      0.56%
AVX512 w/ remainder masking 0.51%
2022-06-12 06:23:55 +03:00
Elad Ashkenazi ec530a2c91
rsx: Suggest to try setting RSX FIFO Accuracy to a higher mode of accuracy on crash (#12204) 2022-06-11 23:26:12 +02:00
kd-11 7530b3c971 vk: Fix image view search and destroy 2022-06-09 02:13:55 +03:00
Eladash f9bc7458d4 rsx: Resurgence of HLE GCM 2022-06-06 12:56:25 +02:00
kd-11 6c315e8aee gl: Disallow overlapping binding points 2022-06-05 10:13:41 +03:00
Elad Ashkenazi 88faac7bbc
rsx: Minor fixup (#12165) 2022-06-04 15:04:27 +01:00
Elad Ashkenazi 9bb7e8d614
rsx: Implement atomic FIFO fetching (stability improvement) (non-default setting) (#12107) 2022-06-04 15:35:06 +03:00
kd-11 286f97fad0 rsx: Reduce some error spam 2022-06-04 14:02:33 +03:00
kd-11 f0a02e0d9d gl: Fix leaking texture views 2022-06-04 14:02:33 +03:00
kd-11 8185bfe893 gl: Track image destruction and remove handles from state tracker
- Handles are reused for different resources which can cause problems
2022-06-04 14:02:33 +03:00
kd-11 d577cebd89 gl: Refactor image and command-context handling
- Move texture object code out of the monolithic header
- All texture binds go through the shared state
- Transient texture binds use a dedicated temp image slot shared with native UI
2022-06-04 14:02:33 +03:00
kd-11 167161d8ce rsx: Restore some accidentally removed depth-format conversion macros 2022-06-03 11:54:09 +03:00
kd-11 b8b0ecabd8 gl: Fix data pointer on the optimized AMD path 2022-06-03 11:54:09 +03:00
kd-11 bb05de2e80 gl: Fix copypasta 2022-06-03 11:54:09 +03:00
kd-11 7890e87234 gl: Fix warning 2022-06-03 11:54:09 +03:00
kd-11 25c05867d6 gl: Fix ring buffer remove() function
- Fixes crash on running a second game in the same session
2022-06-03 11:54:09 +03:00
kd-11 a421270c19 gl: Use new scratch buffer system 2022-06-03 11:54:09 +03:00
kd-11 764fb57fdc gl: Implement scratch ring buffer with memory barriers 2022-06-03 11:54:09 +03:00
kd-11 3fd846687e gl: Refactor buffer object code 2022-06-03 11:54:09 +03:00
kd-11 ff9c939720 gl: Assume decode buffer is to be used as SSBO as this seems to be a hint to the driver about where to put the buffer
Part of OpenGL's achilles' heel - the API does not distinguish between VRAM and SYSTEM memory at all and relies on developers wrestling with the driver's heurestic algorithm for this.
2022-06-03 11:54:09 +03:00
kd-11 234db2be3f gl: Fix texture binding in overlay renderer 2022-06-03 11:54:09 +03:00
kd-11 fc44d53bb0 gl: Reset buffer size on destroying the GPU handle 2022-06-03 11:54:09 +03:00
kd-11 555a4b5f5c gl: Suggest readback buffer as ssbo if it is not provided
- We're likely to jump into a compute or readback pass anyway.
2022-06-03 11:54:09 +03:00
kd-11 a6e6df1445 gl: Implement fast texture readback for D24X8 and RGBA8/BGRA8 2022-06-03 11:54:09 +03:00
Nekotekina 76c72351a5 rsx_methods: fix warning 2022-06-02 12:56:49 +03:00
kd-11 eb52ac55a7 gl: Fix AMD buffer decode 2022-05-31 23:34:14 +03:00
kd-11 d167582f6b gl: Implement on-chip buffer-to-d24x8 conversion 2022-05-31 23:34:14 +03:00
kd-11 dd6cb054a7 gl: Add missing viewport save 2022-05-31 23:34:14 +03:00
kd-11 b97557ce7b gl: Use DSA for compressed texture upload 2022-05-31 23:34:14 +03:00
kd-11 964fd1095e gl: Properly preserve texture state
- Remove rogue glBindTexture calls and use gl commandstate object instead
2022-05-31 23:34:14 +03:00
kd-11 fcc6c2384b Fix linux build 2022-05-31 23:34:14 +03:00
kd-11 a5d73f41b5 gl: Remove debug message 2022-05-31 23:34:14 +03:00
kd-11 1b305bf789 gl: Workaround for poor AMD OpenGL performance
- Turns out the AMD driver really hates it if you render with a mapped index buffer.
  The driver internally seems to make a copy of the consumed indices and uses that. Very slow.
  I was able to isolate this after observing that glDrawArrays is not entirely shit, but glDrawElements duration scaled linearly with the number of vertices.
2022-05-31 23:34:14 +03:00
kd-11 943752db30 gl: Compute optimizations
- Keep buffers around longer to allow driver heurestics to work
- Properly initialize the shaders to allow optimal workgroup dispatch size
2022-05-31 23:34:14 +03:00
kd-11 60a2a39e88 gl: Deswizzle textures on the GPU 2022-05-31 23:34:14 +03:00
kd-11 532563e861 gl: Update some more buffer-object functions 2022-05-31 23:34:14 +03:00
kd-11 3ee27bd434 gl: Optimize consumption of buffer objects when uploading textures 2022-05-31 23:34:14 +03:00
kd-11 55e68441cb gl: Commit to bindless framebuffer object management 2022-05-31 23:34:14 +03:00
kd-11 7ec481d99b rsx: Allocate scratch memory using simple array with no default initialize
- This cuts down processing time significantly by eliminating calls to memset_stosb
2022-05-31 23:34:14 +03:00