Commit graph

378 commits

Author SHA1 Message Date
kd-11 463b1b220d rsx: Improve accuracy of shadow compare Ops when non-integer depth formats are used
- The fixed-point D24S8 format does special Z clamping during compare which matches PS3 behaviour
- D32S8 is a floating point format and comparison with Dref > 1 always fails causing black edges/borders
2019-04-25 16:23:05 +03:00
eladash 6f76e34104 rsx: Fix race on clearing native_ui vs emu_requested flag 2019-04-20 01:04:41 +03:00
kd-11 12dc3c1872 vk: Dynamic heap management to potentially fix ring buffer overflows
- Allows checking one heap type at a time, on demand
- Should avoid OOM situations unless inside an uninterruptible block
2019-04-09 13:40:54 +03:00
kd-11 e4e86455f2 rsx: Fix temporary subresource caching behaviour
- Do not cache if a gathered subresource contains a bound RTT
- Change op to dynamic copy if parent is still bound
2019-04-09 13:40:54 +03:00
kd-11 adc59f9810 rsx: Fix blit transfers when texel sizes mismatch
- Also refactors some bpp handling code
- Simplify texture intersection test to use a normalized/uniform coordinate space
- Fix broken bounds checking as well
2019-03-22 21:27:15 +03:00
kd-11 bb65e45614 rsx: Implement GPU acceleration for rotated images 2019-03-17 21:50:11 +03:00
kd-11 5260f4b47d rsx: Improvements to memory flush mechanism
- Batch dma transfers whenever possible and do them in one go
- vk: Always ensure that queued dma transfers are visible to the GPU before they are needed by the host
  Requires a little refactoring to allow proper communication of the commandbuffer state
- vk: Code cleanup, the simplified mechanism makes it so that its not necessary to pass tons of args to methods
- vk: Fixup - do not forcefully do dma transfers on sections in an invalidation zone! They may have been speculated correctly already
2019-03-17 21:50:11 +03:00
kd-11 385485204b vk/gl: Omit unlocked data when grabbing flip sources from texture cache 2019-03-17 21:50:11 +03:00
kd-11 74eeacd091 vk/gl: Improve memory tag sync and test
- Properly pass parameters such as rsx-pitch to the surface store
- Do not crash if a surface fails verification in flip, use fall-back instead
2019-03-17 21:50:11 +03:00
kd-11 a49a0f2a86 vk/gl: Synchronization improvements
- Properly wait for the buffer transfer operation to finish before map/readback!
- Change vkFence to vkEvent which works more like a GL fence which is what is needed.
- Implement supporting methods and functions
- Do not destroy fence by immediately waiting after copying to dma buffer
2019-03-17 21:50:11 +03:00
kd-11 04dda44225 rsx: Properly generate render target data with all parameters provided
- Build-up to variable-sized framebuffers and AA implementation
- Also allows accurate range calculation for our hit testing
2019-03-10 16:09:05 +03:00
kd-11 9d4d3d9443 rsx: Reimplement render target intersection tests when using hw accelerated blit engine
- Properly collapse memory tree when scanning in case of overlaps!
2019-03-10 16:09:05 +03:00
kd-11 7c379432dd rsx: Implement proper pitch compatibility lookup
- When a single row is required or is all that is available, pitch has
no meaning as the coordinate space changed to 1D
2019-03-10 16:09:05 +03:00
kd-11 10dc3dadee rsx/texture_cache: Improve framebuffer memory locking when WCB/WDB is not enabled
- Adds a new mode that removes non-framebuffer stuff inside framebuffer range
2019-03-10 16:09:05 +03:00
kd-11 3a071a9c07 rsx: Texture search rewrite
- Perform a full search across all resource types as needed without
taking too many shortcuts/hacks
2019-03-10 16:09:05 +03:00
elad bd259c8ae4 vulkan zcull: Fix deadlock in zcull flush waiting
Block adding additional flush requests until the first ones are treated (by adding missing lock)
2019-03-08 23:44:46 +03:00
kd-11 19ff95da70 vk: Fix usage of VK_IMAGE_LAYOUT_GENERAL
- Properly synchronize when transitioning to/from GENERAL layout.
- General layout requires full pipeline dependency since its used in a 'general' sense. As such, its use is to be largely avoided.
2019-02-07 11:40:17 +03:00
kd-11 a36d3af3b4 vk: Minor frame management improvements 2019-02-02 11:54:01 +03:00
kd-11 9ed9d7e947 overlays/osk: Implement native osk interface 2019-02-02 11:54:01 +03:00
kd-11 f47d3a761b vk: Hotfix for fullscreen not working on non-windows platforms 2019-02-01 00:22:11 +03:00
kd-11 3bfa564ef8 vk/windows: Try to keep msq thread from ever stopping
- NVIDIA drivers hook into the msq before our nativeEvent handler. This means NV is aware of events before rpcs3 is aware of them and sometimes stops until a new event is triggered.
  If rpcs3 is inside a driver call at this time, the system will deadlock since the driver waits for msq which waits for the renderer which waits for the driver.
- Use explicit hook management to control window events
- Add fence timeout to attempt detection of surface loss events
2019-01-31 21:53:02 +03:00
kd-11 fb778e4821 rsx: Reimplement attrib divisor 2019-01-25 14:34:22 +03:00
kd-11 6fdc0fd7f0 rsx: Reimplement MSAA transparency
- Apply dither to edges that almost fail the straight-up alpha test
- Significantly improves alpha tested geometry far from the camera
- Also removes blend factor overrides/hacks as they give incorrect results due to background bleeding
2019-01-25 14:34:22 +03:00
kd-11 8093c9b573 rsx: Disable rtt side-effects when async compilation is ongoing. Only real renders should promote buffer state from underined to drawn, otherwise keep previous contents intact. 2019-01-25 14:34:22 +03:00
kd-11 417a2e6731 rsx: Refactor index buffers
- Index offset is ignored anyway and only used to calculate vertex attribute divisor index
- Specialized optimization for untouched xfer without primitive restart
2019-01-25 14:34:22 +03:00
kd-11 52ac0a901a rsx: improve memory coherency
- Avoid tagging and rely on read/write barriers and the dirty flag mechanism. Testing is done with a weak 8-byte memory test
- Introducing new data when tagging breaks applications with race conditions where tags can overwrite flushed data
2019-01-06 10:44:40 +03:00
kd-11 95245bdd83 rsx: Improve ARGB8->D24S8 casting
- Set up partial transfers
- Force clear of target before starting the transfer
2019-01-06 10:44:40 +03:00
kd-11 475cc99117 rsx: Fix dirty flag reset after a partial attachment initialization
- D24S8 targets have 2 aspects that are dealt with separately; Forcefully initialize the remaining data if a partial init is done. Its 'free' anyway
- It seems that the stencil mask matters when clearing unlike the depth mask and color mask
2019-01-06 10:44:40 +03:00
kd-11 c80c7f06bb rsx: Typo fix
- This silly typo broke the flip improvements in the GT fixes PR
2019-01-06 10:44:40 +03:00
kd-11 2a62fa892b rsx: Texture cache refactor
- gl: Include an execution state wrapper to ensure state changes are consistent. Also removes a lot of required 'cleanup' for helper methods
- texture_cache: Make execition context a mandatory field as it is required for all operations. Also removes a lot of situations where duplicate argument is added in for both fixed and vararg fields
- Explicit read/write barrier for framebuffer resources depending on
  usage. Allows for operations like optional memory initialization before
  reading
2019-01-06 10:44:40 +03:00
kd-11 1ffadbe086 rsx: Reorganize write barrier implementation (either clear or memory barrier) 2019-01-06 10:44:40 +03:00
kd-11 474d0f61a2 minor typo fix 2019-01-06 10:44:40 +03:00
kd-11 15d5507154 rsx: Rewrite memory inheritance transfers
- Implicitly invoke a memory barrier if actively reading from an unsynchronized texture
- Simplify memory transfer operations
- Should allow more games to work without strict mode
2019-01-06 10:44:40 +03:00
kd-11 15488eb247 rsx: Avoid unnecessarily touching framebuffer memory
- Do not bind companion framebuffer when clearing single aspect; let the
  contest mechanism sort it out instead
- Do not prematurely tag framebuffers, instead only do so at
  write-confirmation time. Should avoid false tagging if setup does not
  allow a render to occur.
2019-01-06 10:44:40 +03:00
kd-11 a13986ec5c vk: Spec fixups
- Forgot to update descriptor pool init sizes over time
- Also clamp swapchain resources to allowable surface extents
2019-01-05 21:31:12 +03:00
kd-11 9c46386dd4 rsx: Check av configuration when selecting display buffers!
- Some applications have mismatch between video output configuration and display buffer sizes
2018-12-24 09:05:19 +03:00
kd-11 4b79ef1ad9 rsx: Implement stencil mirror views
- Implements a mirror view of D24S8 data that accesses the stencil components.
  Finishes the implementation of TEX2D_DEPTH_RGBA as the stencil component was previously missing from the reconstructed data
- Add a few missing destructors
  Image classes are inherited a lot and I forgot to make the dtors virtual
2018-12-24 09:05:19 +03:00
kd-11 c75749f8ce rsx: fix flip logic when grabbing output from the surface cache 2018-12-24 09:05:19 +03:00
Rui Pinheiro 9d1cdccb1a Implement dedicated texture cache predictor 2018-12-11 22:37:10 +03:00
Rui Pinheiro af360b78f2 Texture cache section management fixups
Fixes VRAM leaks and incorrect destruction of resources, which could
lead to drivers crashes.

Additionally, lock_memory_region is now able to flush superseded
sections. However, due to the potential performance impact of this
for little gain, a new debug setting ("Strict Flushing") has been
added to config.yaml
2018-12-11 22:37:10 +03:00
kd-11 a56ba737b5 vk: Silence log spam 2018-12-03 20:01:23 +03:00
kd-11 5b6e1420f3 rsx: Pipeline barriers fixed up
- Ensure barriers are invoked even if no draw occurs!
-- Ensures that deferred commands are executed eventually
2018-11-30 23:51:25 +03:00
kd-11 8a186bb97e rsx: Fix insertion of execution barriers
- Ignore barriers inserted after BEGIN but before any draw commands are emitted
- Properly process tail barriers inserted before END but after draw commands are submitted
- Ignore execution barriers with no effect (same register value written)
2018-11-30 23:51:25 +03:00
kd-11 5193c99973 rsx: Enable dynamic FIFO preprocessing
- Tries to detect when FIFO preprocessing is beneficial and only enables optimizations if the benefit outweighs the cost
- Current threshold is at least 500 draw calls saved at over 2000 draw calls to justify the overhead
- TODO: More tuning for other CPUs
2018-11-30 23:51:25 +03:00
kd-11 846daadd5d rsx: Fixups
- Improve vertex attribute layout format. Allows for full 16-bit attribute divisor
- Use actual pitch when declaring framebuffer rsx pitch instead of register value in case of swizzle? rendering
2018-11-30 23:51:25 +03:00
kd-11 26a56ef1f1 vk: Spec compliance.
- TODO: Implement push_constants path instead of copy + bind descriptor sets
2018-11-30 23:51:25 +03:00
kd-11 54ec363e88 rsx: Critical pipeline fixes
- Fix scissor and viewport binding behavior
- Fixes recovery if empty scissor is specified and then 'fixed' later
- Optimizes state binding a bit
2018-11-30 23:51:25 +03:00
kd-11 1ad76ad331 rsx: Restructure programs
- Also re-enable pipeline optimizations
2018-11-30 23:51:25 +03:00
kd-11 b0a6b72ce8 rsx: Optimizations
- Replace a few more vectors with simple_array<T>
- Avoid unnecessary string comparisons in backends. We already know referenced textures from the program analysers!
2018-11-30 23:51:25 +03:00
kd-11 677b16f5c6 rsx: Fixups
- Also fix visual corruption when using disjoint indexed draws

- Refactor draw call emit again (vk)

- Improve execution barrier resolve
  - Allow vertex/index rebase inside begin/end pair
  - Add ALPHA_TEST to list of excluded methods [TODO: defer raster state]

- gl bringup

- Simplify
  - using the simple_array gets back a few more fps :)
2018-11-30 23:51:25 +03:00