- Turns out the AMD driver really hates it if you render with a mapped index buffer.
The driver internally seems to make a copy of the consumed indices and uses that. Very slow.
I was able to isolate this after observing that glDrawArrays is not entirely shit, but glDrawElements duration scaled linearly with the number of vertices.
- Rare, but possible if a surface address is switched from color to depth usage
- In such a case, deref the old image and ref the new one to avoid leaks
- Since each program now does a remap of the outputs, we need to reupload the constants
- This is not a loss, constants are almost always changing between draw calls anyway
- For some reason this has a massive impact on performance above some arbitrary threshold of calls
Shows up under surface_cache::get_merged_memory_region when doing gathers.
- NV GPUs have a tendancy to be off by a very small margin, breaking rendering when greaterThan/lessThan checks are used.
- NOTE: Currently this setting is using the sRGB flag which indicates 8-bit output.
Only one game is currently known to care about this behaviour so this is good enough for now.
* Update asmjit dependency (aarch64 branch)
* Disable USE_DISCORD_RPC by default
* Dump some JIT objects in rpcs3 cache dir
* Add SIGILL handler for all platforms
* Fix resetting zeroing denormals in thread pool
* Refactor most v128:: utils into global gv_** functions
* Refactor PPU interpreter (incomplete), remove "precise"
* - Instruction specializations with multiple accuracy flags
* - Adjust calling convention for speed
* - Removed precise/fast setting, replaced with static
* - Started refactoring interpreters for building at runtime JIT
* (I got tired of poor compiler optimizations)
* - Expose some accuracy settings (SAT, NJ, VNAN, FPCC)
* - Add exec_bytes PPU thread variable (akin to cycle count)
* PPU LLVM: fix VCTUXS+VCTSXS instruction NaN results
* SPU interpreter: remove "precise" for now (extremely non-portable)
* - As with PPU, settings changed to static/dynamic for interpreters.
* - Precise options will be implemented later
* Fix termination after fatal error dialog
- This is just not part of spec, there is no enforcement for multiple of block size for width or height of s3tc compressed images.
- This restriction does indeed exist for ASTC and ETC but we're not using those formats.
Optimize CPU capability tests for arch-tuned builds.
Separate streaming and non-streaming utilities.
Rewritten copy_data_swap_u32(_cmp) with AVX2 path.