* rsx: Add code to detect instanced draw commands
* rsx: Add GLSL support for instanced rendering
* rsx: Move draw call related functions to their own class
* rsx: Move more functions from rsx thread to the draw command processor
* rsx: Fix vertex program compiler crash
* vk: Add support for hardware instanced draws
* rsx: Fix instancing bug when indexed addressing is used to read constants
* rsx: Fix rare crash in vertex program decompiler
- This whole decompiler mess needs a rewrite
* rsx: Handle dangling execution barriers
* rsx: Do not use global registers object in logical "firmware" units
* Cosmetic improvements
* rsx: Test vertex program flags on each draw
* rsx: Properly track changes in instancing state
Turns out the pitch was accidentally used as width, leading to an out of bounds read/write.
I kept the pitch in the struct for completeness' sake. It may be needed later, if only for error checks.
- This drastically improves memory allocation behavior.
Holding too many invalidated resources can lead to a cascading overallocation error as old resources hold refs to even older resources and nothing gets deleted.
- Pass a sync address to the backend
- Ignore the hint if the query is running in lazy mode
- Do not submit CBs too close to each other. Submits are expensive
- For some reason this has a massive impact on performance above some arbitrary threshold of calls
Shows up under surface_cache::get_merged_memory_region when doing gathers.
- Specifically fixes a corner case where double transforms are required.
Technically this can be made more readable using transformation matrices:
* M1 = transform_virtual_to_physical()
* M2 = transform_image_to_virtual()
* M3 = M1 * M2
* Result = Input * M3
But we don't use a CPU-side matrix library and it is not reasonable to do this on the GPU.
The following set of conditions can fail
1. We hit a RTT owned texture
2. The texture is invalidated (failed memory integrity test) and set to write/read-through
In this situation, RTT overlap check will skip this surface, and a match can be found in texture cache if WCB/WDB is enabled.
The incoming hit however has no managed payload. This is expected behavior, the search should load from CPU.
- Only implemented for image upscaling.
- Disabled by default. Emulators cannot ensure upscalers are injected at the right rendering step.
- GUI integration not implemented.
Depending on the dpi settings, the debug overlay was almost unreadable.
I also took the liberty to refactor some redundant client size calls and to add some margin to the left of the debug text.
- Transfer writes are expected to clobber surface cache contents. Do NOT reload from CPU memory for writes.
- TODO: During transfer write to surface cache objects, lock memory if it was unlocked to avoid silly problems.