Commit graph

74 commits

Author SHA1 Message Date
Nekotekina 377e7d2a73 C-style cast cleanup VI 2019-12-04 17:56:22 +03:00
kd-11 fd751e3e7b rsx: Improve blit format mismatch detection 2019-11-19 13:18:15 +03:00
kd-11 4a0e1c79ed rsx: Improve format validation for blit engine
- Check all possible cases where format mismatch is possible.
- Warn if a slow path is going to be taken. Should help with future
optimizations.
2019-11-18 13:17:00 +03:00
kd-11 c415578e79 vk: Clamp buffer row length to never be less than declared width
- Fixes some games with broken textures
2019-11-18 13:17:00 +03:00
Emmanuel Gil Peyrot f76720ceb0 Remove extraneous ::narrow<int>() calls
GSL’s gsl::span didn’t use the correct type for its index_type, which is
why they were needed.
2019-11-09 19:30:06 +01:00
Emmanuel Gil Peyrot ef368c5171 rsx: Replace gsl::byte with C++17’s std::byte 2019-11-09 19:30:05 +01:00
kd-11 99d71fdc2a vk: Implement layer batching for the GPU swizzle decoder
- Handles all LODs per layer meaning cubemaps are now fully handled in 6 passes instead of 6 * (log2(width)) passes.
- Handles all LODs of a 3D texture in one pass as well.
- The improvements do warrant dropping down the number of allowed compute invocations a bit
2019-11-05 22:07:22 +03:00
kd-11 1266b63135 vk: Enable gpu deswizzling 2019-11-05 22:07:22 +03:00
kd-11 9cd3530c98 rsx: Set up framework for hw deswizzle 2019-11-05 22:07:22 +03:00
kd-11 aa3eeaa417 rsx: Separate subresource_layout:dim_in_block and
subresource_layout::dim_in_texel

- These two are not always linked when working with compressed textures.
The actual texels extend past the actual size of the image if the size
is not aligned. e.g if height is 1, the real height is 4, but its not
possible to determine this from the aligned size. It could be 1, 2, 3 or
4 for example.
- Fixes image out-of-bounds writes when uploading from CPU
2019-10-29 20:03:54 +03:00
kd-11 ee0633f43a vk: Add turing workaround
- Turing crashes if using the depth->color transfer hack
2019-09-26 20:12:25 +03:00
kd-11 cc313b052f rsx: Improve hit testing when scanning for overlapping surfaces
- Calculate exact sizes when doing hit tests to avoid false negatives
- Defer page checking until actually require to do memory setup
- Introduce align2 helper to do non-pow2 alignments
2019-09-12 23:32:21 +03:00
kd-11 858014b718 rsx: Experiments with nul sink 2019-09-12 23:32:21 +03:00
kd-11 d1603fbb0b vk: Crop malformed image descriptors
- Some image descriptors (lle vdec?) are malformed with pitch being smaller than width
- Crop these for now pending hardware tests
2019-09-08 18:22:27 +03:00
kd-11 cbce309199 vk: Fix depth_stencil scaling 2019-09-08 13:56:41 +03:00
kd-11 440d58f2ff vk: Batch compute jobs when doing texture upload
- Reduces overall number of invocations
2019-09-07 16:23:20 +03:00
kd-11 6aa0b49dbc vk: Prefer using native alignment when uploading.
- Allows using fast copy paths and reduces memory and compute footprint
2019-09-07 16:23:20 +03:00
kd-11 99fb6d6a5d rsx: Allow GPU-accelerated stream manipulation when doing texture uploads 2019-08-30 21:46:19 +03:00
kd-11 141072023b rsx: Fix handling of ARGB8 memory
- Load into memory as straightforward BGRA
- Fixes a bug in vulkan caused by byte shuffling in blit engine vs shader access
- Removes the need for memory shuffling when transferring into a rendertarget
2019-08-21 21:17:15 +03:00
kd-11 dfe709d464 rsx: Surface cache restructuring
- Further improve aliased data preservation by unconditionally scanning.
  Its is possible for cache aliasing to occur when doing memory split.
- Also sets up for RCB/RDB implementation
2019-08-18 20:45:48 +03:00
JohnHolmesII 23094b48bb Fix warnings related to -Wswitch
Add default cases.
Move default breaks to newline
Add proper handling in some instances.
Add missing enums to switches
2019-06-28 01:40:52 +03:00
kd-11 a245d9fb24 vk: DOuble general-purpose heap allocation to 128M and add a better diagnostic message for OOM 2019-05-19 17:33:21 +03:00
kd-11 e3cf3ab6b8 rsx: Minor fixes
- Fix transfer scaling (inverted)
- Fix under-estimated typeless acquisition when doing depth format scaling
2019-05-16 19:25:26 +03:00
kd-11 1c439f6198 vk: Fix some spec violations 2019-05-16 19:25:26 +03:00
kd-11 12dc3c1872 vk: Dynamic heap management to potentially fix ring buffer overflows
- Allows checking one heap type at a time, on demand
- Should avoid OOM situations unless inside an uninterruptible block
2019-04-09 13:40:54 +03:00
kd-11 a5ed30a8c0 rsx: Fixups for data cast operations via typeless transfer 2019-04-09 13:40:54 +03:00
kd-11 3249000511 rsx: Improvements to texture scanning
- Removes CPU-only transforms that broke GPU-side code.
 -- Channels in GPU compute are laid out in cell-order, but CPU was uploading in favorable order and compensating with swizzles.
 -- This leads to 2 different layouts depending on the location of the data (CPU vs GPU)
- Implement R8G8_R8B8 interleaved format decode
- General improvements
2019-04-09 13:40:54 +03:00
kd-11 0f7af391d7 vk: Implement copy-to-buffer and copy-from-buffer for depth_stencil
formats
- Allows D24S8 and D32S8 transport via typeless channels
- Allows uploading and downloading D24S8 data easily
- TODO: Implement optional byteswapping to fix flushed readbacks with
the same method
2019-04-09 13:40:54 +03:00
kd-11 bb65e45614 rsx: Implement GPU acceleration for rotated images 2019-03-17 21:50:11 +03:00
kd-11 0395fb9955 rsx/tecture_cache: Addendum - fix data cast with scaling conversion (AA emulation)
- Blit operations do format conversion automatically which is NOT what we want!
- Scale onto temp buffer with similar format before performing data cast.
2019-03-10 16:09:05 +03:00
kd-11 3a071a9c07 rsx: Texture search rewrite
- Perform a full search across all resource types as needed without
taking too many shortcuts/hacks
2019-03-10 16:09:05 +03:00
kd-11 fa9b448686 vk: Spec fixups
- Disable DEPTH<->RGBA typeless transfers for now as they require a lot more work to work for all vendors
- Do not allow switching layouts to UNDEFINED/PREINITIALIZED formats
2019-01-25 14:34:22 +03:00
kd-11 2a62fa892b rsx: Texture cache refactor
- gl: Include an execution state wrapper to ensure state changes are consistent. Also removes a lot of required 'cleanup' for helper methods
- texture_cache: Make execition context a mandatory field as it is required for all operations. Also removes a lot of situations where duplicate argument is added in for both fixed and vararg fields
- Explicit read/write barrier for framebuffer resources depending on
  usage. Allows for operations like optional memory initialization before
  reading
2019-01-06 10:44:40 +03:00
kd-11 9c45ce6d37 vk: Reimplement typeless memory allocation to handle resolution upscaling 2019-01-06 10:44:40 +03:00
kd-11 15d5507154 rsx: Rewrite memory inheritance transfers
- Implicitly invoke a memory barrier if actively reading from an unsynchronized texture
- Simplify memory transfer operations
- Should allow more games to work without strict mode
2019-01-06 10:44:40 +03:00
kd-11 1ad76ad331 rsx: Restructure programs
- Also re-enable pipeline optimizations
2018-11-30 23:51:25 +03:00
kd-11 c6e35706a3 vk: Support sw component swizzle decode because metal sucks 2018-08-23 22:54:56 +03:00
kd-11 bda65f93a6 vk: Tuning [WIP]
- Unroll main compute queue loop
- Do NOT run GPU cores on mappable memory! This has a dreadful impact on performance for obvious reasons
- Enable dynamic SSBO indexing (affects AMD)
- Make loop unrolling and loop length variable depending on hardware and find optimum
2018-06-26 20:07:20 +03:00
kd-11 5fb4009a07 vk; Add more compute routines to handle texture format conversions
- Implement le D24x8 to le D32 upload routine
- Implement endianness swapping and depth format conversions routines (readback)
2018-06-26 20:07:20 +03:00
kd-11 278cb52f19 facepalm 2018-06-26 20:07:20 +03:00
kd-11 c60f7b89ba vk: Implement safe typeless transfer
- Used to transfer D32S8 data where it makes sense to use this variant
 - On nvidia cards, it is very slow to move aspects from D24S8 probably due to the format being faked.
   For this reason, the unsafe variant is used for both D16 and D24S8 to avoid the heavy performance loss
2018-06-18 17:32:22 +03:00
kd-11 2afcf369ec vk: Add synchronous compute pipelines
- Compute is now used to assist in some parts of blit operations, since there are no format conversions with vulkan like OGL does
- TODO: Integrate this into all types of GPU memory conversion operations instead of downloading to CPU then converting
2018-06-18 17:32:22 +03:00
kd-11 0d5c071eee vk: Implement typeless image transport 2018-06-18 17:32:22 +03:00
kd-11 00eaf39c01 vk: RADV support for depth scaling and transfer 2018-06-08 22:17:50 +03:00
kd-11 fc18e17ba6 vk: Implement depth scaling using hardware blit/copy engines
- Removes the old depth scaling using an overlay.
  It was never going to work properly due to per-pixel stencil writes being unavailable
- TODO: Preserve stencil buffer during ARGB8->D32S8 shader conversion pass
2018-06-08 22:17:50 +03:00
kd-11 0f24379c0e rsx: Obey MSAA resolve during memory persistence transfer
- Ugh. This is a bandaid on a festering wound, AA badly needs a rewrite

 Also silence some warnings
2018-06-08 22:17:50 +03:00
kd-11 91a6091d26 rsx: Minor fixes
- vk: Clear dirty textures before copying 'old contents' in case the old data does not fill the new region
- rsx: Properly decode border color - seems to be in BGRA format
- vk: better approximation of border color to better choose between the presets
- vk: Individually clear color images outside render pass and without scissor
- vk: Fix renderpass selection for clear overlay pass
- vk: Include scissor region when emulating clear mask

NOTES:
- vk: Completely avoid using vkClearXXXXimage - its 'broken' on nvidia drivers
  Spec is vague about the function so its not an actual bug
  ClearAttachment is clearly defined as bypassing bound state which works correctly
- TODO: Implement memory sampling to simulate loading precleared memory if cell used memset to preinitialize the framebuffer
  Autoclear depth to 1|255 and color to 0 is hacky!
2018-04-25 19:14:36 +03:00
kd-11 a42b00488d rsx: Texture fixes
- gl/vk: Fix subresource copy/blit
- gl/vk: Fix default_component_map reading
- vk: Reimplement cell readback path and improve software channel decoder
- Properly name the subresource layout field - its in blocks not bytes!
- Implement d24s8 upload from memory correctly
- Do not ignore DEPTH_FLOAT textures - they are depth textures and abide by the depth compare rules
- NOTE: Redirection of 16-bit textures is not implemented yet
2018-04-25 19:14:36 +03:00
kd-11 9f416e5ce1 rsx/gl/vk: Obey channel remapping on framebuffer resources if requested 2018-03-25 13:31:06 +03:00
pauls-gh fd8d2ecbf4 Remove Volume Texture Compression (VTC) tiling for Vulkan, DX12 and ATI (OpenGL). 2018-03-23 12:01:30 +03:00