Rewritten the following global utility constants:
`umax` returns max number, restricted to unsigned.
`smax` returns max signed number, restricted to integrals.
`smin` returns min signed number, restricted to signed.
`amin` returns smin or zero, less restricted.
`amax` returns smax or umax, less restricted.
Fix operators == and <=> for synthesized rel-ops.
- Transfer writes are expected to clobber surface cache contents. Do NOT reload from CPU memory for writes.
- TODO: During transfer write to surface cache objects, lock memory if it was unlocked to avoid silly problems.
* Multi-thread shader compilation
This offers a huge improvement in startup performance. With around 13,000 shaders we go from ~1:30 to under 10 seconds. It looks like this was the original intention of the author given the outer scope recompile variable.
Signed-off-by: Alex Saveau <saveau.alexandre@gmail.com>
- Avoids a silly situation where a texture is discarded and an identical copy created immediately afterward.
Unfortunately allocating memory blocks is really slow so avoid it as much as possible.
- Some things can be present in program env but not ucode state
e.g A texture can be active and bound in a redirected manner but not actually be used in ucode
In such a case, only the ucode analysis or decompilation can decide whether to inject decoding routines
- Each desc manages its own lifetime now instead of relying on global timestamp check
- Fixes situation where same object remains active without update for long
Please, stop pretending...
You need these templates for generic code.
In other words, in another templates.
Stop increasing compilation time for no reason.
- Base the upscaling on the real source and not the "attr" parameter.
- In case of reconstruction, the source is much larger than the subslice in "attr"
* rsx: Refactor vertex clip emit to avoid using f64 unnecessarily
- Fixes driver crash on intel
* vk: Add NVIDIA driver version check
- Warn if user has outdated drivers with known problems
- Significantly improves compilation speed by simplifying most of the code and doing something similar to LICM.
* Actual decoding is now vectorized and performed in one step rather than in a loop.
* Switches inside loops are removed and replaced with simple comparison. Generates much nicer (and smaller) GCN bytecode.
- Fix special case where n=f making (f-n) = 0
- Dynamically update depth range by setting dirty bits
- Fix depth bounds when n=f and bounds test is disabled
- Tracks which kind of raster was done (Z-ordered vs linear) throughout the application.
- This allows to identify if data is in the expected format or not.
- Avoids double expansion when both the exp_tex flag is set AND the texture also is sampled as signed
- Should fix missing eyeballs in Mass Effect 1 with the previous sign expansion fix
- Already partially supported via EXP option in the shader opcode, but format decoding was disabled.
- Noticed in some UE3 games which use _SNORM variants on PC but _UNORM on rpcs3
- SRC0, SRC1 and SRC2 have different bits for precision modifiers all stored inside SRC1
- This explains the strange observed behavior of the MAD instruction which has 3 inputs
- Sometimes, usually with shaders that do pack/unpack operations, there is a write-to-self operation.
For example r0.xy = r0.xy
Obviously no new data was introduced into "r0" by this, so we should not mark the register as having new data.
- TODO: Investigate on realhw if self-reference is needed to "cast" the overlapping half registers to their full register counterparts.
- Do not allocate too many objects. This is a problem in games using dynamic memory allocators that can make it rare for a surface to fall on the same address twice, keeping zombie RTVs and DSVs alive much longer than needed.
- Current limit used is 256M of virtual VRAM which is impossible on retail PS3
- Those strange offsets noted in some games seem to match to subpixel addressing.
For example, when scaling down by a factor of 4, a pixel offset of 2 will end up inside pixel 0 of the output
It works fine on Mesa iris
Fixes detection of Mesa as recent Mesa does not have "x.org" on vendor string, allowing vendor_MESA to become true instead of vendor_INTEL on Mesa Intel
- In blit engine logic there is a tendancy to over-allocate so as to avoid having to sticth together textures later
- Sometimes this can lead to out of bounds access and crash applications, so memory must be validated
- Noticed when debugging X-men origins: wolverine which has a bogus SCA op whilst writing vector to output
- It makes no sense for both SCA and VEC to both write to the same register in the same instruction as memory ordering becomes an issue
- Fix DIV instruction
- Add EXP_TEX modifier
- Implement WPOS register read
- Swap 3D and Cubemap enums to match RSX ids
- Adds two extra instruction classes: flow control and packing control
- Implement remaining FP instructions with exception of the rare projected texture lookups
- Fix typo causing output color index > 0 to not work
- Fix KIL instruction
- Implement conditional vertex program writes
- Must statically write the gl_ClipDistance registers else you get uninitialized trash.
This problem is more readily apparent on NVIDIA technology but even AMD is not completely immune.
- Fixes a data leak that can happen when a surface is rejected due to aspect mismatch.
- Mismatch can lead to rejection due to area covered excluding the RTT and inevitable upload a texture from CPU at the same location.
- Overlapping fbo/shader_read resources are not allowed.
- Detect writes to the display output memory and handle it specially.
It already defines a known 2D region.
- Try and detect situations where raw transfers would be of benefit.
- It is possible to have a RTV<->DSV transfer with compatible-sized formats.
Mark the depth size as typeless in such a situation to avoid crossing the aspect barrier with the API.