* rsx: Add code to detect instanced draw commands
* rsx: Add GLSL support for instanced rendering
* rsx: Move draw call related functions to their own class
* rsx: Move more functions from rsx thread to the draw command processor
* rsx: Fix vertex program compiler crash
* vk: Add support for hardware instanced draws
* rsx: Fix instancing bug when indexed addressing is used to read constants
* rsx: Fix rare crash in vertex program decompiler
- This whole decompiler mess needs a rewrite
* rsx: Handle dangling execution barriers
* rsx: Do not use global registers object in logical "firmware" units
* Cosmetic improvements
* rsx: Test vertex program flags on each draw
* rsx: Properly track changes in instancing state
- This function was a disaster with random code added in without much thought over a decade.
- Restructures the logic into decode and transfer steps for easier management.
* Savestates: Enable "Start Paused" by default
* Emu/rsx/IO: Resume emulation on long START press
* rsx: fix missing graphics with savestates' "Start Paused" setting
* rsx/overlays: Add simple reference counting for messages to hide them manually
* Move some code in Emulator::Pause() so thread pausing is the first thing done by this function
- Move texture object code out of the monolithic header
- All texture binds go through the shared state
- Transient texture binds use a dedicated temp image slot shared with native UI
- Turns out the AMD driver really hates it if you render with a mapped index buffer.
The driver internally seems to make a copy of the consumed indices and uses that. Very slow.
I was able to isolate this after observing that glDrawArrays is not entirely shit, but glDrawElements duration scaled linearly with the number of vertices.