- Allows sections reclaimed by the surface store due to overlap/inheritance to be identified and removed.
- Additionally, potentially lowers the number of flushes required per block with multiple overlaps improving efficiency and theoretically performance.
- Reject writes to RTT if the source data is of unknown origin.
non-RTT data and only 1 line in length is suspicious and often GPU data like programs or other rendering inputs.
- Attempt to identify blit operations that will be flushed immediately
after and just do them on CPU instead if the transformation is trivial.
- If only a single blit section is contributing to an atlas merge op, the
threshold should be 100%. The only acceptable result here is a
truncation.
- Raise passing 'score' from 50% to 90% to filter out very incomplete
merge operations.
- Catch unfit sections passing the match test; possible for blit_dst
data but will likely be always harmless. Disabled in release builds by default.
* Prefer default initializer over std::memset 0 when possible and more readable.
* Use std::format in trophy files name obtaining.
* Use vm::ptr<>::operator bool() instead of comparing vm::ptr to vm::null or using addr().
* Add a few std::memset calls in hle where it matters (or in some places just to document an actual firmware memcpy call).
Round-to-nearest integral based division, optimized for unsigned integral.
Used in sceNpTrophyGetGameProgress.
Do not allow signed values for aligned_div(), align().
This implementation optimises correctly on all relevant compilers,
unlike GSL’s which gave extremely slow code on any compiler other than
MSVC.
Supersedes #6948.
- When the point sprite flag is set, overrides the input similar to the
2D mask. The returned X and Y values are always the gl_PointCoord values
for the fragment.
- Stacks with the 2D mask to override the z and w coordinates.
* rsx: Optimise primitive_restart::upload_untouched() with SSE4.1
This optimisation is only applied when skip_restart is false.
I’ve only tested the u16 codepath, as it is the one used in NieR.
In some very unscientific profiling, this function used to take 2.76% of
the total frame time at the save point of the port town, it now takes
about 0.40%.
* rsx: Mark all SSE4.1 functions with attributes on gcc and clang
This assures the compiler we will take care of only calling these
functions after having checked that the CPU does support these
instructions.
* rsx: Add an AVX2 implementation of primitive restart ibo upload
* rsx: Remove redefinition of SSE4.1 instructions
Now that clang is aware that our functions are compiled with SSE4.1, it
lets us generate this code using its intrinsics.
* rsx: Optimise vector to scalar conversion
This is done using minpos and srli intrinsics and generate less code
than before.
Thanks Nekotekina for the suggestion!