Commit graph

2594 commits

Author SHA1 Message Date
kd-11
41e7d2aa0a rsx: Select correct image aspect for blit engine targets. 2019-11-19 13:18:15 +03:00
kd-11
fd751e3e7b rsx: Improve blit format mismatch detection 2019-11-19 13:18:15 +03:00
kd-11
41c3180276 rsx: Fix invalid format checks for DMA sections which are typeless 2019-11-19 13:18:15 +03:00
kd-11
9dab0575fa rsx: Add missing format check for the RTV<->DSV transfer case
- TODO: Rewrite resource handling routines
2019-11-18 13:17:00 +03:00
kd-11
4a0e1c79ed rsx: Improve format validation for blit engine
- Check all possible cases where format mismatch is possible.
- Warn if a slow path is going to be taken. Should help with future
optimizations.
2019-11-18 13:17:00 +03:00
kd-11
c415578e79 vk: Clamp buffer row length to never be less than declared width
- Fixes some games with broken textures
2019-11-18 13:17:00 +03:00
kd-11
2408922806 rsx: Do not ignore clamping for some routines that do not have implied range 2019-11-18 13:17:00 +03:00
kd-11
c10aa360b1 rsx: Remove more deprecated methods 2019-11-18 13:17:00 +03:00
Megamouse
a17a5a76a0 overlays: avoid division by zero 2019-11-15 14:53:18 +01:00
Megamouse
fb96047d2f overlays: add settings for overlay graphs 2019-11-15 14:53:18 +01:00
Megamouse
dd1707bd46 overlays: fix center options when graphs are shown 2019-11-15 14:53:18 +01:00
Megamouse
d6b0361a02 overlays: perf_metrics_overlay to seperate header
this is done to prevent severe conflicts with upcoming changes
2019-11-15 14:53:18 +01:00
Anuskuss
7e31c30133 Intel iGPU needs workaround on Windows 2019-11-15 12:08:16 +03:00
Nick Renieris
cc59d319e1 overlay: Performance graphs 2019-11-12 20:43:09 +01:00
kd-11
8234bdb8f0 vk: Check for heap change events after a grow to avoid spec violations
- Avoid referencing the old buffer in stale views. Status can be set
globally if requested during heap creation.
2019-11-10 17:53:12 +03:00
kd-11
5968427a2f vk: Initialize queries before use
- The spec does not guarantee that queries are initialized. In fact, it
now says all queries must be reset before they are used for the first
time.
2019-11-10 17:53:12 +03:00
kd-11
8ea9bc9874 vk: Reduce memory allocation sizes of default heaps
- The heaps will grow as desired, no need to overallocate to cater to
the most resource-hungry games
2019-11-10 17:53:12 +03:00
kd-11
0a32d478df vk: Enable auto-growing of the data heaps for the performance case 2019-11-10 17:53:12 +03:00
kd-11
357e0d2097 vk: Implement explicit runtime flags to manage events like heap sync 2019-11-10 17:53:12 +03:00
kd-11
f359342721 rsx: Implement mutable ring buffers with grow support 2019-11-10 17:53:12 +03:00
kd-11
5f39a594ac rsx: Clean up some unused legacy methods unnecessary after d3d removal 2019-11-10 17:53:12 +03:00
Emmanuel Gil Peyrot
56f82d2701 rsx: Wrap gsl::span definition into Utilities/span.h 2019-11-09 20:00:50 +01:00
Emmanuel Gil Peyrot
f76720ceb0 Remove extraneous ::narrow<int>() calls
GSL’s gsl::span didn’t use the correct type for its index_type, which is
why they were needed.
2019-11-09 19:30:06 +01:00
Emmanuel Gil Peyrot
72cdf0b04c Replace gsl::span’s implementation with tcbrindle’s
This implementation optimises correctly on all relevant compilers,
unlike GSL’s which gave extremely slow code on any compiler other than
MSVC.

Supersedes #6948.
2019-11-09 19:30:06 +01:00
Emmanuel Gil Peyrot
ef368c5171 rsx: Replace gsl::byte with C++17’s std::byte 2019-11-09 19:30:05 +01:00
kd-11
7072489a6e rsx: Implement point sprite coordinate generation
- When the point sprite flag is set, overrides the input similar to the
2D mask. The returned X and Y values are always the gl_PointCoord values
for the fragment.
- Stacks with the 2D mask to override the z and w coordinates.
2019-11-09 12:50:53 +03:00
kd-11
63673b1a9f rsx: Implement full color remap for the D24S8->ARGB8 converter 2019-11-08 19:11:59 +03:00
kd-11
8d1505752f rsx: Validate depth test setup to avoid address contention 2019-11-07 11:32:44 +03:00
kd-11
508ffcb775 vk: Compute kernel fixups
- Adhere to workgroup count limits as exposed by the GPU vendor.
  They already execute properly even when going beyond the limits but this removes validation noise.
- Fix invocation counts for deswizzle kernel. The count was incorrect if blocksize was not 4, causing a bunch of useless work to be done.
2019-11-05 22:07:22 +03:00
kd-11
99d71fdc2a vk: Implement layer batching for the GPU swizzle decoder
- Handles all LODs per layer meaning cubemaps are now fully handled in 6 passes instead of 6 * (log2(width)) passes.
- Handles all LODs of a 3D texture in one pass as well.
- The improvements do warrant dropping down the number of allowed compute invocations a bit
2019-11-05 22:07:22 +03:00
kd-11
7a0b94f343 vk: Minor compute optimizations
- Remove use of uniform buffers for compute static data. Use push
constants instead.
- Minor touchups to the deswizzle code to avoid redundant data copies.
2019-11-05 22:07:22 +03:00
kd-11
1266b63135 vk: Enable gpu deswizzling 2019-11-05 22:07:22 +03:00
kd-11
9cd3530c98 rsx: Set up framework for hw deswizzle 2019-11-05 22:07:22 +03:00
kd-11
57d3c9e171 rsx: Take empty queries into account for engines that spam report reads.
- Some games will spam the report queue with requests but have zpass
statistics enabled.
2019-11-04 18:48:41 +03:00
kd-11
2a8f2c64d2 rsx: Implement report transfer deferring
- Allow delaying report flushes triggered by image_in or buffer_notify
- When the report is ready, all the delayed transfers will automatically
be done.
- TODO: Make this configurable?
2019-11-04 18:48:41 +03:00
kd-11
3e0f9dff4d vk: Improve zcull synchronization
- Use zcull sync hints more aggressively
2019-11-04 18:48:41 +03:00
kd-11
fe3c290d03 vk: Reimplement occlusion result reading
- Implement partial result reads
2019-11-04 18:48:41 +03:00
kd-11
51e0eaaddc rsx: Implement backend notification for upcoming zcull reads 2019-11-04 18:48:41 +03:00
kd-11
df63de8f16 rsx: Allow u32 restart index with full index width 2019-11-04 16:56:34 +03:00
kd-11
6b3af09fa5 vk: Improved crash message for missing MSAA features 2019-11-04 16:56:34 +03:00
kd-11
bbed791ee0 vk: Add explicit support for identity image views
- Allows bypassing all remap shenanigans to make some operations that
rely on the raw image to work correctly.
2019-11-01 19:35:46 +03:00
kd-11
63bbf11a76 vk: Add video out calibration pass
- Adds gamma correction and RGB range filters to output to match PS3
2019-10-31 14:43:24 +03:00
kd-11
78aefe5b5e rsx/overlays: Add support for other primitive types other than triangle_strips 2019-10-31 14:43:24 +03:00
Nekotekina
e3e7051ed3 Minor optimization in BufferUtils.cpp
Don't use PSHUFB for horizontal operations.
Utilize PHMINPOSUW to compute max as well:
 + sse41_hmin_epu16
 + sse41_hmax_epu16
2019-10-30 18:52:34 +03:00
Nekotekina
b1968769b7 Minor cleanup in BufferUtils.cpp
Replace inline asm with intrinsic using target attribute trick.
2019-10-30 17:53:51 +03:00
linkmauve
cfd5cf6bdb Optimise primitive_restart::upload_untouched() (#6881)
* rsx: Optimise primitive_restart::upload_untouched() with SSE4.1

This optimisation is only applied when skip_restart is false.

I’ve only tested the u16 codepath, as it is the one used in NieR.

In some very unscientific profiling, this function used to take 2.76% of
the total frame time at the save point of the port town, it now takes
about 0.40%.

* rsx: Mark all SSE4.1 functions with attributes on gcc and clang

This assures the compiler we will take care of only calling these
functions after having checked that the CPU does support these
instructions.

* rsx: Add an AVX2 implementation of primitive restart ibo upload

* rsx: Remove redefinition of SSE4.1 instructions

Now that clang is aware that our functions are compiled with SSE4.1, it
lets us generate this code using its intrinsics.

* rsx: Optimise vector to scalar conversion

This is done using minpos and srli intrinsics and generate less code
than before.

Thanks Nekotekina for the suggestion!
2019-10-30 16:42:44 +03:00
kd-11
35794dc3f2 vk: Add checks for alphaToOne support
- This feature is very rarely used, as alphaToCoverage is commonly used as a replacement for blending, not in addition to it.
2019-10-30 01:06:28 +03:00
kd-11
eda09489b2 vk: Optionally ignore depth bounds testing on hardware that does not
support it.
2019-10-29 20:03:54 +03:00
kd-11
7a5c20ef85 vk: Minor spec touchups
- Simplify active instance management. While multicontext support will
be required in future, this is better done with multiple logical devices
rather than multiple instances.
- Destroy the WSI surface on exit
- Enable depthBoundsTest explicitly. TODO: Properly check for supported
features.
2019-10-29 20:03:54 +03:00
kd-11
aa3eeaa417 rsx: Separate subresource_layout:dim_in_block and
subresource_layout::dim_in_texel

- These two are not always linked when working with compressed textures.
The actual texels extend past the actual size of the image if the size
is not aligned. e.g if height is 1, the real height is 4, but its not
possible to determine this from the aligned size. It could be 1, 2, 3 or
4 for example.
- Fixes image out-of-bounds writes when uploading from CPU
2019-10-29 20:03:54 +03:00