Drop _xbegin family intrinsics due to bad codegen
Implemented `notifier` class, replacing vm::notify
Minor optimization: detach transactions from global mutex on TSX path
Minor optimization: don't acquire vm::passive_lock on PPU on TSX path
- Properly identify puller spin primitives
- Add a small wake delay after exiting a spin delay. Fixes desynchronization
It seems real hw has a small delay between cell edits to commandbuffer memory at the GET address and the changes becoming visible to the DMA puller
Simulated with a short busy_wait, large values will improve sync but degrade performance
- Reimplements the AMD workaround using an identity buffer to avoid the performance hit of doing multiple glDrawArrays for every single compiled set
- Reimplements first/count allocation using a scratch buffer to reduce allocation overhead when large number of draw calls is used
- Improve dirty state tracking affecting program state
- vk: Refactor out transform constants upload into a separate channel to avoid if possible
transform data uploads are quite expensive
- Introduces a gpu program analyser step to examine shader contents before attempting compilation or cache search
- Avoids detecting shader as being different because of unused textures having state changes
- Adds better program size detection for vertex programs
- Improved vertex program decompiler
- Properly support CAL type instructions
- Support jumping over instructions marked with a termination marker with BRA/CAL class opcodes
- Fix SRC checks and abort
- Fix CC register initialization
- NOTE: Even unused SRC registers have to be valid (usually referencing in.POS)
Allow indirect calls within current function using a jumptable
This restores some functionality removed in SPU ASMJIT 2.0
Change SPUThread::get_ch_value prototype
Use conditional memory access to invalid address.
This approach can allow continue (for debugging);
but at the same time it doesn't add function call to recompiled code.
- Readback does not work at all with float textures on AMD openGL
Driver throws a bogus OUT_OF_MEMORY error regardless of amount of VRAM and system RAM available
- Formats support is linked to the physical device and by extension the logical device derived from it
It therefore makes no sense to track this as a separate object.
Simplifies parameter passing and template specialization.
Also avoids corner cases with AMD hardware (where D24S8 is not supported)