* Update asmjit dependency (aarch64 branch)
* Disable USE_DISCORD_RPC by default
* Dump some JIT objects in rpcs3 cache dir
* Add SIGILL handler for all platforms
* Fix resetting zeroing denormals in thread pool
* Refactor most v128:: utils into global gv_** functions
* Refactor PPU interpreter (incomplete), remove "precise"
* - Instruction specializations with multiple accuracy flags
* - Adjust calling convention for speed
* - Removed precise/fast setting, replaced with static
* - Started refactoring interpreters for building at runtime JIT
* (I got tired of poor compiler optimizations)
* - Expose some accuracy settings (SAT, NJ, VNAN, FPCC)
* - Add exec_bytes PPU thread variable (akin to cycle count)
* PPU LLVM: fix VCTUXS+VCTSXS instruction NaN results
* SPU interpreter: remove "precise" for now (extremely non-portable)
* - As with PPU, settings changed to static/dynamic for interpreters.
* - Precise options will be implemented later
* Fix termination after fatal error dialog
Optimize CPU capability tests for arch-tuned builds.
Separate streaming and non-streaming utilities.
Rewritten copy_data_swap_u32(_cmp) with AVX2 path.
- Treat the draw commands as being consumed on-the-fly with ATTR0 as provoking attribute
- Analysing streams sent to RSX and the results implies they are consumed fully inline.
This only makes sense if a provoking attribute is present. The 'static' register is truly the immediate register for the draw.
- TODO: Optimize this, we can avoid the double bswap in FIFO and then in attribute push
Not very important since nobody is doing register push in high-performance path.
* Use atomic waitables instead instead of global thread wait as often as possible.
* Add ::is_stopped() and and ::is_paued() which can be used in atomic loops and with atomic wait. (constexpr cpu flags test functions)
* Fix notification bug of sys_spu_thread_group_exit/terminate. (old bug, enhanced by #9117)
* Function time statistics at Emu.Stop() restored. (instead of current "X syscall failed with 0x00000000 : 0")
- Fix special case where n=f making (f-n) = 0
- Dynamically update depth range by setting dirty bits
- Fix depth bounds when n=f and bounds test is disabled
- Those strange offsets noted in some games seem to match to subpixel addressing.
For example, when scaling down by a factor of 4, a pixel offset of 2 will end up inside pixel 0 of the output
- Some games use texture semaphore for zcull sync which is rather bizzare.
However, it works on realhw as the depth test happens before fragment shader completion
- Due to the high performance penalty incurred by this act, this
behavior is only enabled by the "strict rendering mode" option.
- Properly synchronize DMA transfers when handling RSX pipeline
barriers. Texture read barrier is used to signify completion of DMA
routines and is often used to signal that Cell can overwrite vertex
data!
- Prefer lazy retire model. Sync commands are sent out and the reports will be
retired when they are available without forcing.
- To make this work with conditional rendering, hardware support is
required where the backend will automatically determine visibility by
itself during rendering.
- Allow delaying report flushes triggered by image_in or buffer_notify
- When the report is ready, all the delayed transfers will automatically
be done.
- TODO: Make this configurable?
This lowers the relative cost of this function from ~2.25% to ~1.80% on
gcc 9 which I found quite surprising, some of it probably gets inlined
better in the callers, but I haven’t been able to isolate which parts.
* allow sys_rsx_device_map to be called twice: in this case the DEVICE address retrived from the previous call returned
* Add ENOMEM checks for sys_rsx_memory_allocate and sys_rsx_context_allocate
* add EINVAL check for sys_rsx_context_allocate if memory handle is not found
* Separate sys_rsx_device_map allocation from sys_rsx_context_allocate's
* Implement sys_rsx_memory_free; used by cellGcmInit upon failure
* Added context_id checks
* Throw if sys_rsx_context_allocate was called twice.