- Noticed a glitch on AMD hw and windows drivers where discard seems to affect entire 4x4 cells.
- Dead fragments (outside the primitive boundary) could have their discards trigger as they do not have proper access to variables.
- This introduces dead fragments along triangle edges, causing a diagonal line pattern across the screen that is very annoying.
- Renormalizes arbitrary N-bit values as 8-bit normalized.
- NV hardware performs integer normalization at 8 bits if the size is less than 8.
- This can cause significant arithmetic drift because the error is multiplied by a huge number when sampling.
- Uncacheable resources can be reused as soon as they're made visible to the draw call.
- Since they're likely to be reused every draw call until the shader changes, it is important to reuse as much as possible
- If a stale reference is left lying around (e.g the texture bound to
depth has been deleted and we attach a color image) no operations
actually take place. glCheckFramebufferStatus also does not catch this
problem.
- Sometimes program-point-size is enabled, but the vs does not actually
write to the point size register. In this case, pass the incoming point
size along instead of the default register init.
- Separate displayed statistics from actual backend statistics.
Allows asynchronous flipping to work correctly as it just uses display stats.
The real stats are used by the frame scope marker to determine behavior like engaging the FIFO optimizer or skipping draw calls correctly.
- Avoid silly broken tests due to queue_tag being called before pitch is initialized.
- Return actual memory range covered and exclude trailing padding.
- Coordinates in src are to be calculated with src_pitch, not required_pitch.
- This allows creating buffers with no MAP bits set which should ensure they are created for VRAM usage only
- TODO: Implement compute kernels to avoid software fallback mode for pack/unpack operations
- Fix 2D coordinate sampling of W coordinate.
W is actually HPOS.w and not 1. Z is however always 0.
- Optimize register usage a bit
Disassembling compiled SPV shows that global declaration results in less ops than using inout modifiers. Modifiers generate extra mov instructions.
- Fix reading of varying registers in FP
Different registers have different behavior
- Always write to varying registers. If a register is not written to, it is initialized to (0, 0, 0, 1)
- Reimplements two-sided lighting correctly without hacks
- Also bumps shader cache version
- Do not allow offloader to handle its own faults. Serialize them on RSX instead.
This approach introduces a GPU race condition that should be avoided with improved synchronization.
- TODO: Use proper GPU-side synchronization to avoid this situation
- Avoids memory appearing older when used for depth test without depth write
The write_barrier before the call will inherit new data but the tag will not update as no new information is added.
- Properly commit orphaned blocks not invalidating existing cache structures
- Do not ignore overwritten objects when commiting as unprotected fbo. Avoids stale references to invalidated surface objects.
- Load into memory as straightforward BGRA
- Fixes a bug in vulkan caused by byte shuffling in blit engine vs shader access
- Removes the need for memory shuffling when transferring into a rendertarget
- Implements render target data load (aka Read Color Buffer/Read Depth Buffer)
- Refactors vulkan surface barrier to be much cleaner.
- Removes redundant surface barrier invocations after doing a merged load
from surface cache.
- Adds explicit access modes when gathering surfaces from cache.
- Further improve aliased data preservation by unconditionally scanning.
Its is possible for cache aliasing to occur when doing memory split.
- Also sets up for RCB/RDB implementation