Commit graph

698 commits

Author SHA1 Message Date
kd-11 8ca53f9c84 rsx: Remember to min-max the anchor indices of a polygon or triangle fan 2019-11-24 19:01:57 +03:00
kd-11 429a76a140 rsx: Remove redundant check 2019-11-23 16:11:18 +03:00
kd-11 41e7d2aa0a rsx: Select correct image aspect for blit engine targets. 2019-11-19 13:18:15 +03:00
kd-11 41c3180276 rsx: Fix invalid format checks for DMA sections which are typeless 2019-11-19 13:18:15 +03:00
kd-11 9dab0575fa rsx: Add missing format check for the RTV<->DSV transfer case
- TODO: Rewrite resource handling routines
2019-11-18 13:17:00 +03:00
kd-11 4a0e1c79ed rsx: Improve format validation for blit engine
- Check all possible cases where format mismatch is possible.
- Warn if a slow path is going to be taken. Should help with future
optimizations.
2019-11-18 13:17:00 +03:00
kd-11 2408922806 rsx: Do not ignore clamping for some routines that do not have implied range 2019-11-18 13:17:00 +03:00
kd-11 0a32d478df vk: Enable auto-growing of the data heaps for the performance case 2019-11-10 17:53:12 +03:00
kd-11 f359342721 rsx: Implement mutable ring buffers with grow support 2019-11-10 17:53:12 +03:00
Emmanuel Gil Peyrot 56f82d2701 rsx: Wrap gsl::span definition into Utilities/span.h 2019-11-09 20:00:50 +01:00
Emmanuel Gil Peyrot f76720ceb0 Remove extraneous ::narrow<int>() calls
GSL’s gsl::span didn’t use the correct type for its index_type, which is
why they were needed.
2019-11-09 19:30:06 +01:00
Emmanuel Gil Peyrot 72cdf0b04c Replace gsl::span’s implementation with tcbrindle’s
This implementation optimises correctly on all relevant compilers,
unlike GSL’s which gave extremely slow code on any compiler other than
MSVC.

Supersedes #6948.
2019-11-09 19:30:06 +01:00
Emmanuel Gil Peyrot ef368c5171 rsx: Replace gsl::byte with C++17’s std::byte 2019-11-09 19:30:05 +01:00
kd-11 7072489a6e rsx: Implement point sprite coordinate generation
- When the point sprite flag is set, overrides the input similar to the
2D mask. The returned X and Y values are always the gl_PointCoord values
for the fragment.
- Stacks with the 2D mask to override the z and w coordinates.
2019-11-09 12:50:53 +03:00
kd-11 63673b1a9f rsx: Implement full color remap for the D24S8->ARGB8 converter 2019-11-08 19:11:59 +03:00
kd-11 1266b63135 vk: Enable gpu deswizzling 2019-11-05 22:07:22 +03:00
kd-11 9cd3530c98 rsx: Set up framework for hw deswizzle 2019-11-05 22:07:22 +03:00
Nekotekina e3e7051ed3 Minor optimization in BufferUtils.cpp
Don't use PSHUFB for horizontal operations.
Utilize PHMINPOSUW to compute max as well:
 + sse41_hmin_epu16
 + sse41_hmax_epu16
2019-10-30 18:52:34 +03:00
Nekotekina b1968769b7 Minor cleanup in BufferUtils.cpp
Replace inline asm with intrinsic using target attribute trick.
2019-10-30 17:53:51 +03:00
linkmauve cfd5cf6bdb Optimise primitive_restart::upload_untouched() (#6881)
* rsx: Optimise primitive_restart::upload_untouched() with SSE4.1

This optimisation is only applied when skip_restart is false.

I’ve only tested the u16 codepath, as it is the one used in NieR.

In some very unscientific profiling, this function used to take 2.76% of
the total frame time at the save point of the port town, it now takes
about 0.40%.

* rsx: Mark all SSE4.1 functions with attributes on gcc and clang

This assures the compiler we will take care of only calling these
functions after having checked that the CPU does support these
instructions.

* rsx: Add an AVX2 implementation of primitive restart ibo upload

* rsx: Remove redefinition of SSE4.1 instructions

Now that clang is aware that our functions are compiled with SSE4.1, it
lets us generate this code using its intrinsics.

* rsx: Optimise vector to scalar conversion

This is done using minpos and srli intrinsics and generate less code
than before.

Thanks Nekotekina for the suggestion!
2019-10-30 16:42:44 +03:00
kd-11 aa3eeaa417 rsx: Separate subresource_layout:dim_in_block and
subresource_layout::dim_in_texel

- These two are not always linked when working with compressed textures.
The actual texels extend past the actual size of the image if the size
is not aligned. e.g if height is 1, the real height is 4, but its not
possible to determine this from the aligned size. It could be 1, 2, 3 or
4 for example.
- Fixes image out-of-bounds writes when uploading from CPU
2019-10-29 20:03:54 +03:00
kd-11 d04241ad25 rsx: Allow compressed textures to be unaligned in size
- Align based on row length but let the texture itself be of arbitrary dimensions
2019-10-28 15:20:45 +03:00
kd-11 e04b6cd7c0 rsx: Copypasta fix
- r1 is always float4 never half4. Its a full-width register unlike the
other outputs which are optionally half-width.
2019-10-23 00:50:24 +03:00
Eladash 945abcc6cd rsx: Align down index array offset
* Also use improved to_be_t<> template (recetly ignoring one byte long types) for vm gsl::byte referencing, remove redundent narrow<> cast (same type)
2019-10-22 13:45:09 +03:00
kd-11 0b2f9f0f17 rsx: Add support for delayed shader discard.
- Noticed a glitch on AMD hw and windows drivers where discard seems to affect entire 4x4 cells.
- Dead fragments (outside the primitive boundary) could have their discards trigger as they do not have proper access to variables.
- This introduces dead fragments along triangle edges, causing a diagonal line pattern across the screen that is very annoying.
2019-10-22 13:44:49 +03:00
kd-11 901942f24a rsx: Replace pointless f32[4] restriction on texture parameters.
- Use a struct instead to improve readability and remove pointless OpBitCast
2019-10-22 13:44:49 +03:00
kd-11 f7842b765f rsx: Implement packed format renormalization
- Renormalizes arbitrary N-bit values as 8-bit normalized.
- NV hardware performs integer normalization at 8 bits if the size is less than 8.
- This can cause significant arithmetic drift because the error is multiplied by a huge number when sampling.
2019-10-22 13:44:49 +03:00
kd-11 09de3b7974 rsx: Tweak behaviour of the "Use GPU texture scaling" option
- If either source data or dest is a render target, do image operations on the GPU same as before
- If swizzle is desired, use CPU fallback
- If no scaling and no format conversion is required, use CPU fallback
- If scaling is desired and the transfer target is in local memory, use the GPU
- When doing trivial copies, use the routine in rsx_methods instead of
  duplicating code. Also has the benefit of better range checking.
2019-10-20 21:38:40 +03:00
kd-11 868547aec8 rsx: Minor improvement to fbo region invalidation
- When commiting a block as fbo, keep blit_dst data as well.
- Avoids removing (and losing data from) blit targets that just happen to share a page with a framebuffer.
2019-10-20 21:38:40 +03:00
kd-11 996534c559 rsx: Fixup for aspect mismatch 2019-10-20 15:25:07 +03:00
kd-11 404073c74a rsx: Force-align compressed formats to 4x4 texel blocks and disable 1D compressed textures.
- The PS3 allows defining 1D compressed images but this obviously doesn't work well on desktop.
2019-10-18 14:46:37 +03:00
kd-11 eff4e95c99 rsx: Minor cache fixup for cyclic references.
- Logic was broken by mipmaps PR. Do not issue a texture barrier if a temp copy is being done.
2019-10-18 14:46:37 +03:00
kd-11 eee2237e19 rsx: Track uncached cache resources
- Uncacheable resources can be reused as soon as they're made visible to the draw call.
- Since they're likely to be reused every draw call until the shader changes, it is important to reuse as much as possible
2019-10-18 14:46:37 +03:00
kd-11 decf9cfcf6 rsx: Notify the backend to release or delete temporary surfaces after we're done with them. 2019-10-18 14:46:37 +03:00
kd-11 a936e43ff6 rsx: Fixup for slice gathering for structures with multiple mipmap levels
- TODO: Proper multi-level assembly for non-2D structures
2019-10-17 18:18:00 +03:00
kd-11 e166dbccc8 rsx: Fix visibility of blit destination targets 2019-10-17 18:18:00 +03:00
kd-11 0c35595ce2 rsx: Remove the alpha-to-coverage hack that was added to hide the missing mipmaps in games
- Moves to a purely stochastic function using dithering to simlulate coverage
2019-10-17 18:18:00 +03:00
kd-11 f0ed0285f3 rsx: Implement range-based subresource descriptor cache
- The previous address-based approach was pretty awful when it comes to invalidating
2019-10-17 18:18:00 +03:00
kd-11 fbb9ed4e25 rsx: Add explicit range to cached subresource descriptors 2019-10-17 18:18:00 +03:00
kd-11 c9e3a321b2 rsx: Fixup for surface cache scanning
- Fix regression when gathering cubemaps
2019-10-17 18:18:00 +03:00
kd-11 1ac976771c rsx: Add some texture search options for the cache
- Potentially optimizes texture cache searching using explicit options
2019-10-17 18:18:00 +03:00
kd-11 840b52fe80 rsx: Implement mipmap gathering from texture cache 2019-10-17 18:18:00 +03:00
kd-11 d6d8766f8d rsx: Refactoring
- Move some helper routines out of the cache core
- Prep for multi-layered image search
2019-10-17 18:18:00 +03:00
kd-11 4a19a2dd24 rsx: Explicity describe transfer regions for both source and destination blocks 2019-10-04 18:10:46 +03:00
kd-11 ef5b56bc48 rsx: Align width properly when normalizing to avoid fractional results being lowered to 0 2019-09-29 11:39:22 +03:00
kd-11 c59cb1bdd3 rsx: Allow only sse4.1 capable CPUs to take the accelerated index path
- Older sets lack the required min/max functionality
2019-09-13 12:28:52 +03:00
kd-11 cc313b052f rsx: Improve hit testing when scanning for overlapping surfaces
- Calculate exact sizes when doing hit tests to avoid false negatives
- Defer page checking until actually require to do memory setup
- Introduce align2 helper to do non-pow2 alignments
2019-09-12 23:32:21 +03:00
kd-11 9842823a8c rsx: Check if memory actually exists when overallocating blit targets 2019-09-12 23:32:21 +03:00
kd-11 cd1345b6bb rsx: Do not use nul section if resolution scaling is active on a surface 2019-09-12 23:32:21 +03:00
kd-11 858014b718 rsx: Experiments with nul sink 2019-09-12 23:32:21 +03:00
kd-11 212ac19c11 vk: Reimplement DMA synchronization 2019-09-12 23:32:21 +03:00
kd-11 60845daf45 rsx: Improve use of CPU vector extensions
- Allow use of intrinsics when SSSE3 and SSSE4.1 are not available in the build target environment
- Properly separate SSE4.1 code from SSSE3 code for some older proceessors without SSE4.1
2019-09-12 14:08:21 +03:00
kd-11 27af75fe71 rsx: Fixup for blit engine when moving inverted regions
- Properly calculate overlap range when sections are inverted
- Simplify transfer logic for inverted regions
2019-09-11 23:30:55 +03:00
kd-11 412c620b9d rsx: Allow sampling from shader_read resources for blit engine
- With harmonization between all texture types implemented, there is no difference between blit_engine_src and shader_read for supported formats
- Adds extra format filtering to ensure no conflicts when copying data
2019-09-10 16:54:02 +03:00
kd-11 75fcfac00e rsx: Modify find_cached_texture to respect gcm_format. Can pass 0 for "dont care" 2019-09-10 16:54:02 +03:00
kd-11 f53361b966 rsx: Fix fast texture copy when src_pitch != width * block_size
- Happens on mipmapped linear images
2019-09-08 18:22:27 +03:00
kd-11 0af9685381 rsx: Deprecate surface_transform::argb_to_bgra which is no longer required.
- vulkan now uses native swizzle mapping for both surface and texture
2019-09-08 13:56:41 +03:00
kd-11 6aa0b49dbc vk: Prefer using native alignment when uploading.
- Allows using fast copy paths and reduces memory and compute footprint
2019-09-07 16:23:20 +03:00
kd-11 a3a0cb8c17 rsx: Minor texture optimizations 2019-09-07 16:23:20 +03:00
kd-11 efa501dac6 rsx/vp: Set default inputs to (0, 0, 0, 1)
- From some hw tests, it seems this is the default.
2019-09-06 17:08:28 +03:00
kd-11 f8dbe281a5 glsl: Explicitly declare const inputs as such
- Avoids copying the values to temp variables before invoking function calls
- Generates shorter, cleaner AST and SPV bytecode
2019-09-06 17:08:28 +03:00
kd-11 9dc06cef7f rsx: Do not include ro data when attempting to do section merge
- Avoids crazy situations like trying to merge from a 3d or cubemap in memory
2019-09-02 16:49:04 +03:00
kd-11 e99e8460fe rsx/texture_cache_utils: Warnings cleanup 2019-09-01 18:59:50 +03:00
kd-11 27fabd7607 rsx/ring_buffer: Warnings cleanup 2019-09-01 18:59:50 +03:00
kd-11 0158a88c88 rsx/textures: Warnings cleanup 2019-09-01 18:59:50 +03:00
kd-11 401bd9112a rsx/prog: Warnings cleanup 2019-09-01 18:59:50 +03:00
kd-11 652f18ebaa rsx/buffers: Warnings cleanup 2019-09-01 18:59:50 +03:00
kd-11 94656ac1e3 rsx/vp: Warnings cleanup 2019-09-01 18:59:50 +03:00
kd-11 0ee9d7b46d rsx/fp: Warnings cleanup 2019-09-01 18:59:50 +03:00
kd-11 7f99de36c1 rsx: Fixup for surface_target_a flag being broken
- While the mask for surface_a is at index 0, the surface cache expects the order to be maintained correctly!
  Set the correct mask since surface store now checks each RTT individually
2019-08-30 21:46:19 +03:00
kd-11 99fb6d6a5d rsx: Allow GPU-accelerated stream manipulation when doing texture uploads 2019-08-30 21:46:19 +03:00
kd-11 e55d216619 rsx: Workarounds for some buggy games
- Replace assert with log message until hardware testing confirms findings
2019-08-28 14:54:51 +03:00
kd-11 e334a43169 rsx: Fix surface cache hit tests
- Avoid silly broken tests due to queue_tag being called before pitch is initialized.
- Return actual memory range covered and exclude trailing padding.
- Coordinates in src are to be calculated with src_pitch, not required_pitch.
2019-08-28 14:54:51 +03:00
kd-11 2962e05f26 rsx: Implement per-RTT color masks
- Also refactors and simplifies some common code in surface store and rsx core
2019-08-27 21:59:02 +03:00
kd-11 27aeaf66bc gl: Restructure buffer objects to give more control over usage
- This allows creating buffers with no MAP bits set which should ensure they are created for VRAM usage only
- TODO: Implement compute kernels to avoid software fallback mode for pack/unpack operations
2019-08-27 21:59:02 +03:00
kd-11 eed32cf3a4 rsx: Decompiler fixups and improvements
- Fix 2D coordinate sampling of W coordinate.
  W is actually HPOS.w and not 1. Z is however always 0.
- Optimize register usage a bit
  Disassembling compiled SPV shows that global declaration results in less ops than using inout modifiers. Modifiers generate extra mov instructions.
2019-08-26 20:03:31 +03:00
kd-11 3e28e4b1e0 rsx/decompiler: Restructure program register behavior
- Fix reading of varying registers in FP
  Different registers have different behavior
- Always write to varying registers. If a register is not written to, it is initialized to (0, 0, 0, 1)
- Reimplements two-sided lighting correctly without hacks
- Also bumps shader cache version
2019-08-26 20:03:31 +03:00
kd-11 fe6ff8622a rsx: Decompiler fixups for conditional execution
- Cond actually obeys vector mask
2019-08-26 20:03:31 +03:00
kd-11 f9aea076ae rsx: Implement depth_buffer_float support.
- Since this is transparent to the application at all time, it only becomes a problem when doing memory transfer or DEPTH->RGBA conversion in shaders.
2019-08-26 20:03:31 +03:00
kd-11 c67c97844e rsx: Fixup for blit engine range calculations 2019-08-21 21:17:15 +03:00
kd-11 5d1b7eb945 rsx: Fix reference leaks in texture_cache<->surface_cache communication
- Properly commit orphaned blocks not invalidating existing cache structures
- Do not ignore overwritten objects when commiting as unprotected fbo. Avoids stale references to invalidated surface objects.
2019-08-21 21:17:15 +03:00
kd-11 141072023b rsx: Fix handling of ARGB8 memory
- Load into memory as straightforward BGRA
- Fixes a bug in vulkan caused by byte shuffling in blit engine vs shader access
- Removes the need for memory shuffling when transferring into a rendertarget
2019-08-21 21:17:15 +03:00
kd-11 9cd5325962 rsx: Free memory 'held hostage' by storage sections in the surface cache
- Once the memory has been captured by another surface, release the allocation
2019-08-21 21:17:15 +03:00
kd-11 be98554b40 rsx: Fix surface split logic
- Calculations are supposed to be done based on the properties of the outgoing surface
2019-08-21 21:17:15 +03:00
kd-11 67dac94704 rsx/fp: Zero-initialize FragDepth register to match hw 2019-08-21 21:17:15 +03:00
kd-11 dca29def5e rsx: Temporary workaround for race condition in blit engine 2019-08-18 20:45:48 +03:00
kd-11 5e299111cc rsx/vk: Restructure surface access barriers and implement RCB/RDB
- Implements render target data load (aka Read Color Buffer/Read Depth Buffer)
- Refactors vulkan surface barrier to be much cleaner.
- Removes redundant surface barrier invocations after doing a merged load
  from surface cache.
- Adds explicit access modes when gathering surfaces from cache.
2019-08-18 20:45:48 +03:00
kd-11 dfe709d464 rsx: Surface cache restructuring
- Further improve aliased data preservation by unconditionally scanning.
  Its is possible for cache aliasing to occur when doing memory split.
- Also sets up for RCB/RDB implementation
2019-08-18 20:45:48 +03:00
kd-11 1de90bdb1f rsx: Improve aliased data preservation
- Carve out inherited region if any
- Perform pitch compatibility test before assigning old_surface
2019-07-27 16:09:21 +03:00
kd-11 e2574ff100 rsx: Support CSAA transparency without multiple rasterization samples enabled 2019-07-19 15:49:08 +03:00
kd-11 ea2f4d57fa rsx: Fixups 2019-07-17 13:29:42 +03:00
kd-11 113a49e00c rsx: Handle cyclic references when doing memory inheritance 2019-07-17 13:29:42 +03:00
kd-11 34b06453f9 rsx: Handle lost data due to unused data sections
- After splitting, the sections may not be referenced at all for anything other than just pixel storage
- In such cases, either merge down or sample from the upstream source instead
2019-07-17 13:29:42 +03:00
kd-11 009e01a347 rsx: Set up for multi-section inheritance 2019-07-17 13:29:42 +03:00
kd-11 fc09572648 rsx: Implement texel border decode
- Texel borders are no longer actually supported in modern APIs
- Removes the border texels and uses border color instead which is incorrect but should work fine
2019-07-11 13:22:13 +03:00
kd-11 d8f753f1e8 rsx: Do not allow framebuffer surfaces that exceed their allocated pitch dimensions
- Truncate surfaces to forcefully fit inside the declared region
2019-07-11 13:22:13 +03:00
kd-11 c072c511a1 rsx: Add support for slice padding rows when gathering slices for cubemap/3d 2019-07-09 16:27:59 +03:00
kd-11 ad10eb391e vk: Reuse discarded memory whenever possible instead of recreating new
objects
- Memory allocations are surprisingly expensive when spammed
2019-07-03 15:52:16 +03:00
kd-11 71e809a78b rsx: Implement dma abort in case of a reset after misprediction 2019-07-03 15:52:16 +03:00
Eladash 43f919c04b Fixup after #6143 (#6146)
vm::spu max address was overflowing resulting in issues, so cast to u64 where needed. Fixes #6145.
    Use vm::get_addr instead of manually substructing vm::base(0) from pointer in texture cache code.
    Prefer std::atomic_thread_fence over _mm_?fence(), adjust usage to be more correct.
    Used sequantially consistent ordering in semaphore_release for TSX path as well.
    Improved memory ordering for sys_rsx_context_iounmap/map.
    Fixed sync bugs in HLE gcm because of not using atomic instructions.
    Use release memory barrier in lwsync for PPU LLVM, according to this xbox360 programming guide lwsync is a hw release memory barrier.
    Also use release barrier where lwsync was originally used in liblv2 sys_lwmutex and cellSync.
    Use acquire barrier for isync instruction, see https://devblogs.microsoft.com/oldnewthing/20180814-00/?p=99485
2019-06-29 18:48:42 +03:00