Commit graph

2561 commits

Author SHA1 Message Date
kd-11
57d3c9e171 rsx: Take empty queries into account for engines that spam report reads.
- Some games will spam the report queue with requests but have zpass
statistics enabled.
2019-11-04 18:48:41 +03:00
kd-11
2a8f2c64d2 rsx: Implement report transfer deferring
- Allow delaying report flushes triggered by image_in or buffer_notify
- When the report is ready, all the delayed transfers will automatically
be done.
- TODO: Make this configurable?
2019-11-04 18:48:41 +03:00
kd-11
3e0f9dff4d vk: Improve zcull synchronization
- Use zcull sync hints more aggressively
2019-11-04 18:48:41 +03:00
kd-11
fe3c290d03 vk: Reimplement occlusion result reading
- Implement partial result reads
2019-11-04 18:48:41 +03:00
kd-11
51e0eaaddc rsx: Implement backend notification for upcoming zcull reads 2019-11-04 18:48:41 +03:00
kd-11
df63de8f16 rsx: Allow u32 restart index with full index width 2019-11-04 16:56:34 +03:00
kd-11
6b3af09fa5 vk: Improved crash message for missing MSAA features 2019-11-04 16:56:34 +03:00
kd-11
bbed791ee0 vk: Add explicit support for identity image views
- Allows bypassing all remap shenanigans to make some operations that
rely on the raw image to work correctly.
2019-11-01 19:35:46 +03:00
kd-11
63bbf11a76 vk: Add video out calibration pass
- Adds gamma correction and RGB range filters to output to match PS3
2019-10-31 14:43:24 +03:00
kd-11
78aefe5b5e rsx/overlays: Add support for other primitive types other than triangle_strips 2019-10-31 14:43:24 +03:00
Nekotekina
e3e7051ed3 Minor optimization in BufferUtils.cpp
Don't use PSHUFB for horizontal operations.
Utilize PHMINPOSUW to compute max as well:
 + sse41_hmin_epu16
 + sse41_hmax_epu16
2019-10-30 18:52:34 +03:00
Nekotekina
b1968769b7 Minor cleanup in BufferUtils.cpp
Replace inline asm with intrinsic using target attribute trick.
2019-10-30 17:53:51 +03:00
linkmauve
cfd5cf6bdb Optimise primitive_restart::upload_untouched() (#6881)
* rsx: Optimise primitive_restart::upload_untouched() with SSE4.1

This optimisation is only applied when skip_restart is false.

I’ve only tested the u16 codepath, as it is the one used in NieR.

In some very unscientific profiling, this function used to take 2.76% of
the total frame time at the save point of the port town, it now takes
about 0.40%.

* rsx: Mark all SSE4.1 functions with attributes on gcc and clang

This assures the compiler we will take care of only calling these
functions after having checked that the CPU does support these
instructions.

* rsx: Add an AVX2 implementation of primitive restart ibo upload

* rsx: Remove redefinition of SSE4.1 instructions

Now that clang is aware that our functions are compiled with SSE4.1, it
lets us generate this code using its intrinsics.

* rsx: Optimise vector to scalar conversion

This is done using minpos and srli intrinsics and generate less code
than before.

Thanks Nekotekina for the suggestion!
2019-10-30 16:42:44 +03:00
kd-11
35794dc3f2 vk: Add checks for alphaToOne support
- This feature is very rarely used, as alphaToCoverage is commonly used as a replacement for blending, not in addition to it.
2019-10-30 01:06:28 +03:00
kd-11
eda09489b2 vk: Optionally ignore depth bounds testing on hardware that does not
support it.
2019-10-29 20:03:54 +03:00
kd-11
7a5c20ef85 vk: Minor spec touchups
- Simplify active instance management. While multicontext support will
be required in future, this is better done with multiple logical devices
rather than multiple instances.
- Destroy the WSI surface on exit
- Enable depthBoundsTest explicitly. TODO: Properly check for supported
features.
2019-10-29 20:03:54 +03:00
kd-11
aa3eeaa417 rsx: Separate subresource_layout:dim_in_block and
subresource_layout::dim_in_texel

- These two are not always linked when working with compressed textures.
The actual texels extend past the actual size of the image if the size
is not aligned. e.g if height is 1, the real height is 4, but its not
possible to determine this from the aligned size. It could be 1, 2, 3 or
4 for example.
- Fixes image out-of-bounds writes when uploading from CPU
2019-10-29 20:03:54 +03:00
Eladash
42fc698186 rsx: Enable primitive restart index only when needed (#6889)
* rsx: Enable primitive restart index only when needed

* rsx: Use if with initializer in read_put()
2019-10-28 23:16:27 +03:00
kd-11
479d92d075 vk: Fix uninitialized (and wrong) variable access 2019-10-28 15:20:45 +03:00
kd-11
b0708367c2 vk: Round lod bias to the nearest 0.5 to lower number of permutations when nearest mipmap sampling is used
- The lambda values will be rounded to the nearest integer anyway
2019-10-28 15:20:45 +03:00
kd-11
3e8dfede1c vk: Modify sampler cache to uniquely identify all the input parameters
- Avoids iteration when variable mipmap counts or lod bias parameters change
2019-10-28 15:20:45 +03:00
kd-11
ad2add9574 rsx:: Use fcmp correctly 2019-10-28 15:20:45 +03:00
kd-11
d04241ad25 rsx: Allow compressed textures to be unaligned in size
- Align based on row length but let the texture itself be of arbitrary dimensions
2019-10-28 15:20:45 +03:00
Emmanuel Gil Peyrot
69e9ee26f6 rsx: Make input_is_swizzled a template parameter
This lowers the relative cost of this function from ~2.25% to ~1.80% on
gcc 9 which I found quite surprising, some of it probably gets inlined
better in the callers, but I haven’t been able to isolate which parts.
2019-10-28 13:28:51 +03:00
kd-11
d53d7bb598 vk: Restore vega native use of FP16 in shaders
- AMD proprietary drivers should work fine
2019-10-23 12:20:06 +03:00
Emmanuel Gil Peyrot
54d95373d0 Support fullscreen properly on Wayland
The current behaviour when going fullscreen from windowed was to keep
the previous size of the swapchain, with black borders on all sides,
which looks quite ugly.

The root of this issue is that rpcs3 only checks for frame resize if
vkQueuePresent() returns VK_SUBOPTIMAL_KHR, which drivers can’t do on
Wayland, see https://gitlab.freedesktop.org/mesa/mesa/issues/1979
2019-10-23 12:19:46 +03:00
kd-11
e04b6cd7c0 rsx: Copypasta fix
- r1 is always float4 never half4. Its a full-width register unlike the
other outputs which are optionally half-width.
2019-10-23 00:50:24 +03:00
kd-11
00bc3fe658 Drop d3d12 backend 2019-10-22 21:45:14 +03:00
Emmanuel Gil Peyrot
14c63ec014 Fix misleading indent. 2019-10-22 16:11:43 +03:00
Eladash
586fe11e22 Fix cellGcm HLE regression
Also correct flags.
2019-10-22 13:45:09 +03:00
Eladash
945abcc6cd rsx: Align down index array offset
* Also use improved to_be_t<> template (recetly ignoring one byte long types) for vm gsl::byte referencing, remove redundent narrow<> cast (same type)
2019-10-22 13:45:09 +03:00
kd-11
3bb70e837a vk: Silly copypasta 2019-10-22 13:44:49 +03:00
kd-11
0b2f9f0f17 rsx: Add support for delayed shader discard.
- Noticed a glitch on AMD hw and windows drivers where discard seems to affect entire 4x4 cells.
- Dead fragments (outside the primitive boundary) could have their discards trigger as they do not have proper access to variables.
- This introduces dead fragments along triangle edges, causing a diagonal line pattern across the screen that is very annoying.
2019-10-22 13:44:49 +03:00
kd-11
901942f24a rsx: Replace pointless f32[4] restriction on texture parameters.
- Use a struct instead to improve readability and remove pointless OpBitCast
2019-10-22 13:44:49 +03:00
kd-11
f7842b765f rsx: Implement packed format renormalization
- Renormalizes arbitrary N-bit values as 8-bit normalized.
- NV hardware performs integer normalization at 8 bits if the size is less than 8.
- This can cause significant arithmetic drift because the error is multiplied by a huge number when sampling.
2019-10-22 13:44:49 +03:00
Eladash
29cddc30f0 rsx: Fix vblank signals flood after Emu.Resume() 2019-10-21 15:31:45 +03:00
Eladash
5de0005f5a rsx: Report full method range on invalid methods
Also report full command on fifo desync event for the first time
2019-10-21 15:31:45 +03:00
eladash
730e9cde84 sys_rsx: Improve allocations and error checks
* allow sys_rsx_device_map to be called twice: in this case the DEVICE address retrived from the previous call returned
* Add ENOMEM checks for sys_rsx_memory_allocate and sys_rsx_context_allocate
* add EINVAL check for sys_rsx_context_allocate if memory handle is not found
* Separate sys_rsx_device_map allocation from sys_rsx_context_allocate's
* Implement sys_rsx_memory_free; used by cellGcmInit upon failure
* Added context_id checks
* Throw if sys_rsx_context_allocate was called twice.
2019-10-21 15:31:45 +03:00
kd-11
3c44065684 gl: Fix copypasta
- MSAA is still unimplemented in OGL
2019-10-20 21:38:40 +03:00
kd-11
f40f2c6215 vk: Fix minification filter description for NEAREST_MIPMAP_NEAREST. Just a typo.
- Also remove mipmap filter for CONVOLUTION
2019-10-20 21:38:40 +03:00
kd-11
09de3b7974 rsx: Tweak behaviour of the "Use GPU texture scaling" option
- If either source data or dest is a render target, do image operations on the GPU same as before
- If swizzle is desired, use CPU fallback
- If no scaling and no format conversion is required, use CPU fallback
- If scaling is desired and the transfer target is in local memory, use the GPU
- When doing trivial copies, use the routine in rsx_methods instead of
  duplicating code. Also has the benefit of better range checking.
2019-10-20 21:38:40 +03:00
kd-11
868547aec8 rsx: Minor improvement to fbo region invalidation
- When commiting a block as fbo, keep blit_dst data as well.
- Avoids removing (and losing data from) blit targets that just happen to share a page with a framebuffer.
2019-10-20 21:38:40 +03:00
kd-11
996534c559 rsx: Fixup for aspect mismatch 2019-10-20 15:25:07 +03:00
Eladash
d4ba7f37b6 rsx util: Implement decode_fxp<> 2019-10-18 15:41:39 +03:00
kd-11
299b98b30a vk: Disable mipmap sampling if sampling mode is does not have a mipmap filtering mode.
- GL_LINEAR and GL_NEAREST always sample LOD0 so make vulkan behave the same way
2019-10-18 14:46:37 +03:00
kd-11
404073c74a rsx: Force-align compressed formats to 4x4 texel blocks and disable 1D compressed textures.
- The PS3 allows defining 1D compressed images but this obviously doesn't work well on desktop.
2019-10-18 14:46:37 +03:00
kd-11
eff4e95c99 rsx: Minor cache fixup for cyclic references.
- Logic was broken by mipmaps PR. Do not issue a texture barrier if a temp copy is being done.
2019-10-18 14:46:37 +03:00
kd-11
bd1bcc6be7 vk: Remove a redundant memory barrier 2019-10-18 14:46:37 +03:00
kd-11
70642484cd vk: Check for cyclic references if sampler is marked as do-not-cache.
- Usually an indication of surface/texture cache interaction.
2019-10-18 14:46:37 +03:00
kd-11
eee2237e19 rsx: Track uncached cache resources
- Uncacheable resources can be reused as soon as they're made visible to the draw call.
- Since they're likely to be reused every draw call until the shader changes, it is important to reuse as much as possible
2019-10-18 14:46:37 +03:00