- Each desc manages its own lifetime now instead of relying on global timestamp check
- Fixes situation where same object remains active without update for long
Please, stop pretending...
You need these templates for generic code.
In other words, in another templates.
Stop increasing compilation time for no reason.
- Base the upscaling on the real source and not the "attr" parameter.
- In case of reconstruction, the source is much larger than the subslice in "attr"
* rsx: Refactor vertex clip emit to avoid using f64 unnecessarily
- Fixes driver crash on intel
* vk: Add NVIDIA driver version check
- Warn if user has outdated drivers with known problems
- Significantly improves compilation speed by simplifying most of the code and doing something similar to LICM.
* Actual decoding is now vectorized and performed in one step rather than in a loop.
* Switches inside loops are removed and replaced with simple comparison. Generates much nicer (and smaller) GCN bytecode.
- Fix special case where n=f making (f-n) = 0
- Dynamically update depth range by setting dirty bits
- Fix depth bounds when n=f and bounds test is disabled
- Tracks which kind of raster was done (Z-ordered vs linear) throughout the application.
- This allows to identify if data is in the expected format or not.
- Avoids double expansion when both the exp_tex flag is set AND the texture also is sampled as signed
- Should fix missing eyeballs in Mass Effect 1 with the previous sign expansion fix
- Already partially supported via EXP option in the shader opcode, but format decoding was disabled.
- Noticed in some UE3 games which use _SNORM variants on PC but _UNORM on rpcs3
- SRC0, SRC1 and SRC2 have different bits for precision modifiers all stored inside SRC1
- This explains the strange observed behavior of the MAD instruction which has 3 inputs
- Sometimes, usually with shaders that do pack/unpack operations, there is a write-to-self operation.
For example r0.xy = r0.xy
Obviously no new data was introduced into "r0" by this, so we should not mark the register as having new data.
- TODO: Investigate on realhw if self-reference is needed to "cast" the overlapping half registers to their full register counterparts.
- Do not allocate too many objects. This is a problem in games using dynamic memory allocators that can make it rare for a surface to fall on the same address twice, keeping zombie RTVs and DSVs alive much longer than needed.
- Current limit used is 256M of virtual VRAM which is impossible on retail PS3
- Those strange offsets noted in some games seem to match to subpixel addressing.
For example, when scaling down by a factor of 4, a pixel offset of 2 will end up inside pixel 0 of the output
It works fine on Mesa iris
Fixes detection of Mesa as recent Mesa does not have "x.org" on vendor string, allowing vendor_MESA to become true instead of vendor_INTEL on Mesa Intel
- In blit engine logic there is a tendancy to over-allocate so as to avoid having to sticth together textures later
- Sometimes this can lead to out of bounds access and crash applications, so memory must be validated
- Noticed when debugging X-men origins: wolverine which has a bogus SCA op whilst writing vector to output
- It makes no sense for both SCA and VEC to both write to the same register in the same instruction as memory ordering becomes an issue
- Fix DIV instruction
- Add EXP_TEX modifier
- Implement WPOS register read
- Swap 3D and Cubemap enums to match RSX ids
- Adds two extra instruction classes: flow control and packing control
- Implement remaining FP instructions with exception of the rare projected texture lookups
- Fix typo causing output color index > 0 to not work
- Fix KIL instruction
- Implement conditional vertex program writes
- Must statically write the gl_ClipDistance registers else you get uninitialized trash.
This problem is more readily apparent on NVIDIA technology but even AMD is not completely immune.
- Fixes a data leak that can happen when a surface is rejected due to aspect mismatch.
- Mismatch can lead to rejection due to area covered excluding the RTT and inevitable upload a texture from CPU at the same location.
- Overlapping fbo/shader_read resources are not allowed.
- Detect writes to the display output memory and handle it specially.
It already defines a known 2D region.
- Try and detect situations where raw transfers would be of benefit.
- It is possible to have a RTV<->DSV transfer with compatible-sized formats.
Mark the depth size as typeless in such a situation to avoid crossing the aspect barrier with the API.
- Allows sections reclaimed by the surface store due to overlap/inheritance to be identified and removed.
- Additionally, potentially lowers the number of flushes required per block with multiple overlaps improving efficiency and theoretically performance.
- Reject writes to RTT if the source data is of unknown origin.
non-RTT data and only 1 line in length is suspicious and often GPU data like programs or other rendering inputs.
- Attempt to identify blit operations that will be flushed immediately
after and just do them on CPU instead if the transformation is trivial.
- If only a single blit section is contributing to an atlas merge op, the
threshold should be 100%. The only acceptable result here is a
truncation.
- Raise passing 'score' from 50% to 90% to filter out very incomplete
merge operations.
- Catch unfit sections passing the match test; possible for blit_dst
data but will likely be always harmless. Disabled in release builds by default.
* Prefer default initializer over std::memset 0 when possible and more readable.
* Use std::format in trophy files name obtaining.
* Use vm::ptr<>::operator bool() instead of comparing vm::ptr to vm::null or using addr().
* Add a few std::memset calls in hle where it matters (or in some places just to document an actual firmware memcpy call).
Round-to-nearest integral based division, optimized for unsigned integral.
Used in sceNpTrophyGetGameProgress.
Do not allow signed values for aligned_div(), align().
This implementation optimises correctly on all relevant compilers,
unlike GSL’s which gave extremely slow code on any compiler other than
MSVC.
Supersedes #6948.
- When the point sprite flag is set, overrides the input similar to the
2D mask. The returned X and Y values are always the gl_PointCoord values
for the fragment.
- Stacks with the 2D mask to override the z and w coordinates.
* rsx: Optimise primitive_restart::upload_untouched() with SSE4.1
This optimisation is only applied when skip_restart is false.
I’ve only tested the u16 codepath, as it is the one used in NieR.
In some very unscientific profiling, this function used to take 2.76% of
the total frame time at the save point of the port town, it now takes
about 0.40%.
* rsx: Mark all SSE4.1 functions with attributes on gcc and clang
This assures the compiler we will take care of only calling these
functions after having checked that the CPU does support these
instructions.
* rsx: Add an AVX2 implementation of primitive restart ibo upload
* rsx: Remove redefinition of SSE4.1 instructions
Now that clang is aware that our functions are compiled with SSE4.1, it
lets us generate this code using its intrinsics.
* rsx: Optimise vector to scalar conversion
This is done using minpos and srli intrinsics and generate less code
than before.
Thanks Nekotekina for the suggestion!
subresource_layout::dim_in_texel
- These two are not always linked when working with compressed textures.
The actual texels extend past the actual size of the image if the size
is not aligned. e.g if height is 1, the real height is 4, but its not
possible to determine this from the aligned size. It could be 1, 2, 3 or
4 for example.
- Fixes image out-of-bounds writes when uploading from CPU
- Noticed a glitch on AMD hw and windows drivers where discard seems to affect entire 4x4 cells.
- Dead fragments (outside the primitive boundary) could have their discards trigger as they do not have proper access to variables.
- This introduces dead fragments along triangle edges, causing a diagonal line pattern across the screen that is very annoying.
- Renormalizes arbitrary N-bit values as 8-bit normalized.
- NV hardware performs integer normalization at 8 bits if the size is less than 8.
- This can cause significant arithmetic drift because the error is multiplied by a huge number when sampling.
- If either source data or dest is a render target, do image operations on the GPU same as before
- If swizzle is desired, use CPU fallback
- If no scaling and no format conversion is required, use CPU fallback
- If scaling is desired and the transfer target is in local memory, use the GPU
- When doing trivial copies, use the routine in rsx_methods instead of
duplicating code. Also has the benefit of better range checking.
- When commiting a block as fbo, keep blit_dst data as well.
- Avoids removing (and losing data from) blit targets that just happen to share a page with a framebuffer.
- Uncacheable resources can be reused as soon as they're made visible to the draw call.
- Since they're likely to be reused every draw call until the shader changes, it is important to reuse as much as possible
- Calculate exact sizes when doing hit tests to avoid false negatives
- Defer page checking until actually require to do memory setup
- Introduce align2 helper to do non-pow2 alignments
- Allow use of intrinsics when SSSE3 and SSSE4.1 are not available in the build target environment
- Properly separate SSE4.1 code from SSSE3 code for some older proceessors without SSE4.1
- With harmonization between all texture types implemented, there is no difference between blit_engine_src and shader_read for supported formats
- Adds extra format filtering to ensure no conflicts when copying data
- While the mask for surface_a is at index 0, the surface cache expects the order to be maintained correctly!
Set the correct mask since surface store now checks each RTT individually
- Avoid silly broken tests due to queue_tag being called before pitch is initialized.
- Return actual memory range covered and exclude trailing padding.
- Coordinates in src are to be calculated with src_pitch, not required_pitch.