Commit graph

2582 commits

Author SHA1 Message Date
Anuskuss 7e31c30133 Intel iGPU needs workaround on Windows 2019-11-15 12:08:16 +03:00
Nick Renieris cc59d319e1 overlay: Performance graphs 2019-11-12 20:43:09 +01:00
kd-11 8234bdb8f0 vk: Check for heap change events after a grow to avoid spec violations
- Avoid referencing the old buffer in stale views. Status can be set
globally if requested during heap creation.
2019-11-10 17:53:12 +03:00
kd-11 5968427a2f vk: Initialize queries before use
- The spec does not guarantee that queries are initialized. In fact, it
now says all queries must be reset before they are used for the first
time.
2019-11-10 17:53:12 +03:00
kd-11 8ea9bc9874 vk: Reduce memory allocation sizes of default heaps
- The heaps will grow as desired, no need to overallocate to cater to
the most resource-hungry games
2019-11-10 17:53:12 +03:00
kd-11 0a32d478df vk: Enable auto-growing of the data heaps for the performance case 2019-11-10 17:53:12 +03:00
kd-11 357e0d2097 vk: Implement explicit runtime flags to manage events like heap sync 2019-11-10 17:53:12 +03:00
kd-11 f359342721 rsx: Implement mutable ring buffers with grow support 2019-11-10 17:53:12 +03:00
kd-11 5f39a594ac rsx: Clean up some unused legacy methods unnecessary after d3d removal 2019-11-10 17:53:12 +03:00
Emmanuel Gil Peyrot 56f82d2701 rsx: Wrap gsl::span definition into Utilities/span.h 2019-11-09 20:00:50 +01:00
Emmanuel Gil Peyrot f76720ceb0 Remove extraneous ::narrow<int>() calls
GSL’s gsl::span didn’t use the correct type for its index_type, which is
why they were needed.
2019-11-09 19:30:06 +01:00
Emmanuel Gil Peyrot 72cdf0b04c Replace gsl::span’s implementation with tcbrindle’s
This implementation optimises correctly on all relevant compilers,
unlike GSL’s which gave extremely slow code on any compiler other than
MSVC.

Supersedes #6948.
2019-11-09 19:30:06 +01:00
Emmanuel Gil Peyrot ef368c5171 rsx: Replace gsl::byte with C++17’s std::byte 2019-11-09 19:30:05 +01:00
kd-11 7072489a6e rsx: Implement point sprite coordinate generation
- When the point sprite flag is set, overrides the input similar to the
2D mask. The returned X and Y values are always the gl_PointCoord values
for the fragment.
- Stacks with the 2D mask to override the z and w coordinates.
2019-11-09 12:50:53 +03:00
kd-11 63673b1a9f rsx: Implement full color remap for the D24S8->ARGB8 converter 2019-11-08 19:11:59 +03:00
kd-11 8d1505752f rsx: Validate depth test setup to avoid address contention 2019-11-07 11:32:44 +03:00
kd-11 508ffcb775 vk: Compute kernel fixups
- Adhere to workgroup count limits as exposed by the GPU vendor.
  They already execute properly even when going beyond the limits but this removes validation noise.
- Fix invocation counts for deswizzle kernel. The count was incorrect if blocksize was not 4, causing a bunch of useless work to be done.
2019-11-05 22:07:22 +03:00
kd-11 99d71fdc2a vk: Implement layer batching for the GPU swizzle decoder
- Handles all LODs per layer meaning cubemaps are now fully handled in 6 passes instead of 6 * (log2(width)) passes.
- Handles all LODs of a 3D texture in one pass as well.
- The improvements do warrant dropping down the number of allowed compute invocations a bit
2019-11-05 22:07:22 +03:00
kd-11 7a0b94f343 vk: Minor compute optimizations
- Remove use of uniform buffers for compute static data. Use push
constants instead.
- Minor touchups to the deswizzle code to avoid redundant data copies.
2019-11-05 22:07:22 +03:00
kd-11 1266b63135 vk: Enable gpu deswizzling 2019-11-05 22:07:22 +03:00
kd-11 9cd3530c98 rsx: Set up framework for hw deswizzle 2019-11-05 22:07:22 +03:00
kd-11 57d3c9e171 rsx: Take empty queries into account for engines that spam report reads.
- Some games will spam the report queue with requests but have zpass
statistics enabled.
2019-11-04 18:48:41 +03:00
kd-11 2a8f2c64d2 rsx: Implement report transfer deferring
- Allow delaying report flushes triggered by image_in or buffer_notify
- When the report is ready, all the delayed transfers will automatically
be done.
- TODO: Make this configurable?
2019-11-04 18:48:41 +03:00
kd-11 3e0f9dff4d vk: Improve zcull synchronization
- Use zcull sync hints more aggressively
2019-11-04 18:48:41 +03:00
kd-11 fe3c290d03 vk: Reimplement occlusion result reading
- Implement partial result reads
2019-11-04 18:48:41 +03:00
kd-11 51e0eaaddc rsx: Implement backend notification for upcoming zcull reads 2019-11-04 18:48:41 +03:00
kd-11 df63de8f16 rsx: Allow u32 restart index with full index width 2019-11-04 16:56:34 +03:00
kd-11 6b3af09fa5 vk: Improved crash message for missing MSAA features 2019-11-04 16:56:34 +03:00
kd-11 bbed791ee0 vk: Add explicit support for identity image views
- Allows bypassing all remap shenanigans to make some operations that
rely on the raw image to work correctly.
2019-11-01 19:35:46 +03:00
kd-11 63bbf11a76 vk: Add video out calibration pass
- Adds gamma correction and RGB range filters to output to match PS3
2019-10-31 14:43:24 +03:00
kd-11 78aefe5b5e rsx/overlays: Add support for other primitive types other than triangle_strips 2019-10-31 14:43:24 +03:00
Nekotekina e3e7051ed3 Minor optimization in BufferUtils.cpp
Don't use PSHUFB for horizontal operations.
Utilize PHMINPOSUW to compute max as well:
 + sse41_hmin_epu16
 + sse41_hmax_epu16
2019-10-30 18:52:34 +03:00
Nekotekina b1968769b7 Minor cleanup in BufferUtils.cpp
Replace inline asm with intrinsic using target attribute trick.
2019-10-30 17:53:51 +03:00
linkmauve cfd5cf6bdb Optimise primitive_restart::upload_untouched() (#6881)
* rsx: Optimise primitive_restart::upload_untouched() with SSE4.1

This optimisation is only applied when skip_restart is false.

I’ve only tested the u16 codepath, as it is the one used in NieR.

In some very unscientific profiling, this function used to take 2.76% of
the total frame time at the save point of the port town, it now takes
about 0.40%.

* rsx: Mark all SSE4.1 functions with attributes on gcc and clang

This assures the compiler we will take care of only calling these
functions after having checked that the CPU does support these
instructions.

* rsx: Add an AVX2 implementation of primitive restart ibo upload

* rsx: Remove redefinition of SSE4.1 instructions

Now that clang is aware that our functions are compiled with SSE4.1, it
lets us generate this code using its intrinsics.

* rsx: Optimise vector to scalar conversion

This is done using minpos and srli intrinsics and generate less code
than before.

Thanks Nekotekina for the suggestion!
2019-10-30 16:42:44 +03:00
kd-11 35794dc3f2 vk: Add checks for alphaToOne support
- This feature is very rarely used, as alphaToCoverage is commonly used as a replacement for blending, not in addition to it.
2019-10-30 01:06:28 +03:00
kd-11 eda09489b2 vk: Optionally ignore depth bounds testing on hardware that does not
support it.
2019-10-29 20:03:54 +03:00
kd-11 7a5c20ef85 vk: Minor spec touchups
- Simplify active instance management. While multicontext support will
be required in future, this is better done with multiple logical devices
rather than multiple instances.
- Destroy the WSI surface on exit
- Enable depthBoundsTest explicitly. TODO: Properly check for supported
features.
2019-10-29 20:03:54 +03:00
kd-11 aa3eeaa417 rsx: Separate subresource_layout:dim_in_block and
subresource_layout::dim_in_texel

- These two are not always linked when working with compressed textures.
The actual texels extend past the actual size of the image if the size
is not aligned. e.g if height is 1, the real height is 4, but its not
possible to determine this from the aligned size. It could be 1, 2, 3 or
4 for example.
- Fixes image out-of-bounds writes when uploading from CPU
2019-10-29 20:03:54 +03:00
Eladash 42fc698186 rsx: Enable primitive restart index only when needed (#6889)
* rsx: Enable primitive restart index only when needed

* rsx: Use if with initializer in read_put()
2019-10-28 23:16:27 +03:00
kd-11 479d92d075 vk: Fix uninitialized (and wrong) variable access 2019-10-28 15:20:45 +03:00
kd-11 b0708367c2 vk: Round lod bias to the nearest 0.5 to lower number of permutations when nearest mipmap sampling is used
- The lambda values will be rounded to the nearest integer anyway
2019-10-28 15:20:45 +03:00
kd-11 3e8dfede1c vk: Modify sampler cache to uniquely identify all the input parameters
- Avoids iteration when variable mipmap counts or lod bias parameters change
2019-10-28 15:20:45 +03:00
kd-11 ad2add9574 rsx:: Use fcmp correctly 2019-10-28 15:20:45 +03:00
kd-11 d04241ad25 rsx: Allow compressed textures to be unaligned in size
- Align based on row length but let the texture itself be of arbitrary dimensions
2019-10-28 15:20:45 +03:00
Emmanuel Gil Peyrot 69e9ee26f6 rsx: Make input_is_swizzled a template parameter
This lowers the relative cost of this function from ~2.25% to ~1.80% on
gcc 9 which I found quite surprising, some of it probably gets inlined
better in the callers, but I haven’t been able to isolate which parts.
2019-10-28 13:28:51 +03:00
kd-11 d53d7bb598 vk: Restore vega native use of FP16 in shaders
- AMD proprietary drivers should work fine
2019-10-23 12:20:06 +03:00
Emmanuel Gil Peyrot 54d95373d0 Support fullscreen properly on Wayland
The current behaviour when going fullscreen from windowed was to keep
the previous size of the swapchain, with black borders on all sides,
which looks quite ugly.

The root of this issue is that rpcs3 only checks for frame resize if
vkQueuePresent() returns VK_SUBOPTIMAL_KHR, which drivers can’t do on
Wayland, see https://gitlab.freedesktop.org/mesa/mesa/issues/1979
2019-10-23 12:19:46 +03:00
kd-11 e04b6cd7c0 rsx: Copypasta fix
- r1 is always float4 never half4. Its a full-width register unlike the
other outputs which are optionally half-width.
2019-10-23 00:50:24 +03:00
kd-11 00bc3fe658 Drop d3d12 backend 2019-10-22 21:45:14 +03:00
Emmanuel Gil Peyrot 14c63ec014 Fix misleading indent. 2019-10-22 16:11:43 +03:00