SRC coordinates
- When extracting a 1x1 texture from another texture of a different
format, width conversion can result in a dimension of 0 if the
extracted texel is not a full texel in SRC
- Avoids double expansion when both the exp_tex flag is set AND the texture also is sampled as signed
- Should fix missing eyeballs in Mass Effect 1 with the previous sign expansion fix
- On multithreaded mesa, the program initialization routine was not
being flushed correctly. Set up synchronization fence after initialization
is complete.
- Do not allocate too many objects. This is a problem in games using dynamic memory allocators that can make it rare for a surface to fall on the same address twice, keeping zombie RTVs and DSVs alive much longer than needed.
- Current limit used is 256M of virtual VRAM which is impossible on retail PS3
It works fine on Mesa iris
Fixes detection of Mesa as recent Mesa does not have "x.org" on vendor string, allowing vendor_MESA to become true instead of vendor_INTEL on Mesa Intel
- 16-bit channel formats have special 0x4|0xE encoding for only 2 channels, not 4
- float textures do not take any remapping and crash if you try it.
Depth float textures work fine though.
- merge disable_asynchronous_shader_compiler and interpreter_mode
- removes disable_asynchronous_shader_compiler setting
- Adds the resulting settings as radio buttons to the gui tab
- Fix DIV instruction
- Add EXP_TEX modifier
- Implement WPOS register read
- Swap 3D and Cubemap enums to match RSX ids
- Adds two extra instruction classes: flow control and packing control
- Implement remaining FP instructions with exception of the rare projected texture lookups
- Fix typo causing output color index > 0 to not work
- Fix KIL instruction
- Implement conditional vertex program writes
- Must statically write the gl_ClipDistance registers else you get uninitialized trash.
This problem is more readily apparent on NVIDIA technology but even AMD is not completely immune.
Use -fno-exceptions in cmake.
On MSVC, enable _HAS_EXCEPTION=0.
Cleanup throw/catch from the source.
Create yaml.cpp enclave because it needs exception to work.
Disable thread_local optimizations in logs.cpp (TODO).
Implement cpu_counter for cpu_threads (moved globals).
- Take viewport offset into account when applying window transforms.
This is necessary because gl_FragCoord is based on the framebuffer and not the viewport.
- Adds support for partial (letterboxed) source images by taking insets
into account.
- Bugfix for potential access violation when capturing screenshot on
vulkan
- Avoid double transfers where a transfer to a temp image is done
without scaling and then a secondary transfer follows. Combines the two
steps into one whenever possible which can significantly alleviate
bandwidth problems at higher resolutions. Significant speedup, upto 90%
in some cases (PDF, PDF2)
- Prefer lazy retire model. Sync commands are sent out and the reports will be
retired when they are available without forcing.
- To make this work with conditional rendering, hardware support is
required where the backend will automatically determine visibility by
itself during rendering.
- Noticed a glitch on AMD hw and windows drivers where discard seems to affect entire 4x4 cells.
- Dead fragments (outside the primitive boundary) could have their discards trigger as they do not have proper access to variables.
- This introduces dead fragments along triangle edges, causing a diagonal line pattern across the screen that is very annoying.
- Renormalizes arbitrary N-bit values as 8-bit normalized.
- NV hardware performs integer normalization at 8 bits if the size is less than 8.
- This can cause significant arithmetic drift because the error is multiplied by a huge number when sampling.
- Uncacheable resources can be reused as soon as they're made visible to the draw call.
- Since they're likely to be reused every draw call until the shader changes, it is important to reuse as much as possible
- If a stale reference is left lying around (e.g the texture bound to
depth has been deleted and we attach a color image) no operations
actually take place. glCheckFramebufferStatus also does not catch this
problem.
- Sometimes program-point-size is enabled, but the vs does not actually
write to the point size register. In this case, pass the incoming point
size along instead of the default register init.
- Separate displayed statistics from actual backend statistics.
Allows asynchronous flipping to work correctly as it just uses display stats.
The real stats are used by the frame scope marker to determine behavior like engaging the FIFO optimizer or skipping draw calls correctly.
- Avoid silly broken tests due to queue_tag being called before pitch is initialized.
- Return actual memory range covered and exclude trailing padding.
- Coordinates in src are to be calculated with src_pitch, not required_pitch.
- This allows creating buffers with no MAP bits set which should ensure they are created for VRAM usage only
- TODO: Implement compute kernels to avoid software fallback mode for pack/unpack operations
- Fix 2D coordinate sampling of W coordinate.
W is actually HPOS.w and not 1. Z is however always 0.
- Optimize register usage a bit
Disassembling compiled SPV shows that global declaration results in less ops than using inout modifiers. Modifiers generate extra mov instructions.
- Fix reading of varying registers in FP
Different registers have different behavior
- Always write to varying registers. If a register is not written to, it is initialized to (0, 0, 0, 1)
- Reimplements two-sided lighting correctly without hacks
- Also bumps shader cache version
- Do not allow offloader to handle its own faults. Serialize them on RSX instead.
This approach introduces a GPU race condition that should be avoided with improved synchronization.
- TODO: Use proper GPU-side synchronization to avoid this situation
- Avoids memory appearing older when used for depth test without depth write
The write_barrier before the call will inherit new data but the tag will not update as no new information is added.
- Properly commit orphaned blocks not invalidating existing cache structures
- Do not ignore overwritten objects when commiting as unprotected fbo. Avoids stale references to invalidated surface objects.
- Load into memory as straightforward BGRA
- Fixes a bug in vulkan caused by byte shuffling in blit engine vs shader access
- Removes the need for memory shuffling when transferring into a rendertarget
- Implements render target data load (aka Read Color Buffer/Read Depth Buffer)
- Refactors vulkan surface barrier to be much cleaner.
- Removes redundant surface barrier invocations after doing a merged load
from surface cache.
- Adds explicit access modes when gathering surfaces from cache.
- Further improve aliased data preservation by unconditionally scanning.
Its is possible for cache aliasing to occur when doing memory split.
- Also sets up for RCB/RDB implementation
- Merge viewport raster window and scissor into one clipping region
- Viewport raster clip is different from viewport geometry clipping in
hardware as the latter is configurable separately
vm::spu max address was overflowing resulting in issues, so cast to u64 where needed. Fixes#6145.
Use vm::get_addr instead of manually substructing vm::base(0) from pointer in texture cache code.
Prefer std::atomic_thread_fence over _mm_?fence(), adjust usage to be more correct.
Used sequantially consistent ordering in semaphore_release for TSX path as well.
Improved memory ordering for sys_rsx_context_iounmap/map.
Fixed sync bugs in HLE gcm because of not using atomic instructions.
Use release memory barrier in lwsync for PPU LLVM, according to this xbox360 programming guide lwsync is a hw release memory barrier.
Also use release barrier where lwsync was originally used in liblv2 sys_lwmutex and cellSync.
Use acquire barrier for isync instruction, see https://devblogs.microsoft.com/oldnewthing/20180814-00/?p=99485
- When multithreaded RSX is enabled, the vertex cache just lowers performance
- The small cost of upload is paid by the asynchronous thread, allowing RSX to work optimally
- Multiple header files where missing #includes to other headers that
where used in the header. Correct header was included in correct
order in source files which caused everything to compile.
- Added missing #includes so header files correctly include all their
dependencies and fixes problems with IDEs being unable to parse
headers correctly due to missing symbols
- Removes a lot of wm_event code that was used to perform window management and is no longer needed.
- Significantly simplifies the vulkan code.
- Implements resource management when vulkan window is minimized to allow resources to be freed.
- Allows render targets to behave like stacked 3D views same as shader inputs are resolved
- Basically implements most of 'Read Color/Depth Buffers" option for 'free'.
- Allows splitting RTV/DSV resources if they are superceded by a partial surface
- Also allows intersecting new resources through the surface cache for proper inheritance from other scattered data
- TODO: Refactor bind_surface_as_rtt and bind_surface_as_ds to reduce asinine code duplication
- The hw generates inaccurate values when doing perspective-correct
interpolation of vertex output attributes and makes the comparison (a ==
b) fail even when they are a fixed constant value.
- Increase equality tolerance when doing comparisons in fragment
shaders for NV cards only to work around this issue.
- Teepo fix