- Fix 2D coordinate sampling of W coordinate.
W is actually HPOS.w and not 1. Z is however always 0.
- Optimize register usage a bit
Disassembling compiled SPV shows that global declaration results in less ops than using inout modifiers. Modifiers generate extra mov instructions.
- Fix reading of varying registers in FP
Different registers have different behavior
- Always write to varying registers. If a register is not written to, it is initialized to (0, 0, 0, 1)
- Reimplements two-sided lighting correctly without hacks
- Also bumps shader cache version
- Do not allow offloader to handle its own faults. Serialize them on RSX instead.
This approach introduces a GPU race condition that should be avoided with improved synchronization.
- TODO: Use proper GPU-side synchronization to avoid this situation
- Avoids memory appearing older when used for depth test without depth write
The write_barrier before the call will inherit new data but the tag will not update as no new information is added.
- Properly commit orphaned blocks not invalidating existing cache structures
- Do not ignore overwritten objects when commiting as unprotected fbo. Avoids stale references to invalidated surface objects.
- Load into memory as straightforward BGRA
- Fixes a bug in vulkan caused by byte shuffling in blit engine vs shader access
- Removes the need for memory shuffling when transferring into a rendertarget
- Implements render target data load (aka Read Color Buffer/Read Depth Buffer)
- Refactors vulkan surface barrier to be much cleaner.
- Removes redundant surface barrier invocations after doing a merged load
from surface cache.
- Adds explicit access modes when gathering surfaces from cache.
- Further improve aliased data preservation by unconditionally scanning.
Its is possible for cache aliasing to occur when doing memory split.
- Also sets up for RCB/RDB implementation
vkAcquireNextImageKHR can also return VK_SUBOPTIMAL_KHR and is non-fatal.
However, it's a good idea to still recreate the swap chain later to maintain
optimal presentation paths after temporary occlusion.
- ZCULL queue was updated to one-per-cb but the conditional render sync hint was not updated.
- Do not unconditionally flush the queue unless the upcoming ref is contained in the active CB.
- This avoids spamming queue flush, which frees up resources and improves performance
- Merge viewport raster window and scissor into one clipping region
- Viewport raster clip is different from viewport geometry clipping in
hardware as the latter is configurable separately
- Tagged eventIDs can be used to safely delete resources that are no
longer used
- TODO: Expand gc to collect images as well
- TODO: Fix the texture cache to avoid over-allocating image resources
- Fix a typo in OpenAL
- Fix typo in cellHttp.h
- Unused variables in catch
- Use 64-bit shifts
- Use use_count with shared pointers, unique is depracated and getting removed
- Explicitly cast boolean to int
- Signed/unsigned issues with loop variables
- Fix missing return statement (the code path is unreachable, but compiler wants a return)
- */ ouside of comment
- Fix duplicate layout name
vm::spu max address was overflowing resulting in issues, so cast to u64 where needed. Fixes#6145.
Use vm::get_addr instead of manually substructing vm::base(0) from pointer in texture cache code.
Prefer std::atomic_thread_fence over _mm_?fence(), adjust usage to be more correct.
Used sequantially consistent ordering in semaphore_release for TSX path as well.
Improved memory ordering for sys_rsx_context_iounmap/map.
Fixed sync bugs in HLE gcm because of not using atomic instructions.
Use release memory barrier in lwsync for PPU LLVM, according to this xbox360 programming guide lwsync is a hw release memory barrier.
Also use release barrier where lwsync was originally used in liblv2 sys_lwmutex and cellSync.
Use acquire barrier for isync instruction, see https://devblogs.microsoft.com/oldnewthing/20180814-00/?p=99485
-Indentation warnings
-prevent shift overflow
-This was declared extern in all contexts. Remove this for initialization
-Fix main return types. OH CANADA!
-Silence extraneos 'unused expression' warning
-Force use return value (warning)
-Remove tautological compare copy-pasta (char always < 256)
- Do not consume a slot every draw call, instead batch as many draws as possible
- Since renderpasses are dispatched per-draw-clause, keeping occlusion queries outside the renderpasses works fine
- If renderpasses are reorganized, occlusion tasks will have to be reorganized again
- Remove string comparisons from the hot-path!
- Use attribute streaming and push constants to avoid forcing a descriptor block copy every other draw call/pass.
While this isn't so bad on nvidia cards, it makes AMD cards a slideshow.
- When multithreaded RSX is enabled, the vertex cache just lowers performance
- The small cost of upload is paid by the asynchronous thread, allowing RSX to work optimally