Commit graph

1757 commits

Author SHA1 Message Date
kd-11 0fc67aa2f6 gl: fix wcb regression
- Partial framebuffers and blit targets are possible!
2018-05-24 10:36:04 +03:00
kd-11 493d4e8613 fixup - Improve invalidated region checks for performance 2018-05-24 10:36:04 +03:00
kd-11 b030d1900c rsx: Fixup - fix broken memory protection fail caused by region respec
- Some applications will alternate memory between framebuffer and texture data
2018-05-24 10:36:04 +03:00
kd-11 f38f61d110 vk: Clean up memory allocation and fix GPUOpen VMA for Radeon 2018-05-23 19:07:08 +03:00
kd-11 92b5a705d8 fixup - locking 2018-05-23 19:07:08 +03:00
kd-11 b957eac6e8 rsx: Avoid calling any blocking callbacks from threads that are not rsx::thread
- Defers on_notity_memory_unmapped to only run from within rsx context
- Avoids passive_lock + writer_lock deadlock
2018-05-23 19:07:08 +03:00
kd-11 d2bf04796f Optimized cached write-through
- Allows grabbing an unsynchronized memory block if overwriting contents
anyway
- Allows flushing only specified range of memory
2018-05-23 19:07:08 +03:00
kd-11 f8d999b384 fixup - range check 2018-05-23 19:07:08 +03:00
kd-11 fbf6581249 rsx: Fix segmented memory access for rsx::super_ptr 2018-05-23 19:07:08 +03:00
kd-11 d283200e13 vk: Do not do extension test if in a fast context (enum only) 2018-05-23 19:07:08 +03:00
kd-11 3f14bc6961 rsx: Silence some meaningless error 2018-05-23 19:07:08 +03:00
kd-11 f2a3167193 rsx: Lower format compatibility severity since it confuses some people 2018-05-23 19:07:08 +03:00
kd-11 8fcd5c1e5a rsx: Texture cache fixes
1. rsx: Rework section synchronization using the new memory mirrors
2. rsx: Tweaks
    - Simplify peeking into the current rsx::thread instance.
      Use a simple rsx::get_current_renderer instead of asking fxm for the same
    - Fix global rsx super memory shm block management
3. rsx: Improve memory validation. test_framebuffer() and
tag_framebuffer() are simplified due to mirror support
4. rsx: Only write back confirmed memory range to avoid overapproximation errors in blit engine
5. rsx: Explicitly mark clobbered flushable sections as dirty to have them
removed
6. rsx: Cumulative fixes
    - Reimplement rsx::buffered_section management routines
    - blit engine subsections are not hit-tested against confirmed/committed memory range
      Not all applications are 'honest' about region bounds, making the real cpu range useless for blit ops
2018-05-23 19:07:08 +03:00
pauls-gh f8a0be8c3e Performance enhancement - Vulkan memory allocator (#4635)
* Incorporates the vulkan memory allocator from the AMD GPUOpen project
2018-05-23 17:02:35 +03:00
scribam 2270b8d15c vulkan: link with vulkan-1.lib instead of VKstatic.1.lib 2018-05-23 13:54:27 +03:00
Nekotekina 72574b11ff SPU: use reservation spinlocks on writes (non-TSX)
This should decrease contention by avoiding global lock
2018-05-21 21:56:14 +03:00
kd-11 c9669818eb Facepalm
- overlays: Do not free self handle!!!!
2018-05-21 15:55:25 +03:00
kd-11 f6f45b8699
Native UI refactored (#4623)
Refactor and improve native overlays
2018-05-20 23:05:00 +03:00
Nekotekina 367f039523 Build transactions at runtime
Drop _xbegin family intrinsics due to bad codegen
Implemented `notifier` class, replacing vm::notify
Minor optimization: detach transactions from global mutex on TSX path
Minor optimization: don't acquire vm::passive_lock on PPU on TSX path
2018-05-16 17:31:58 +03:00
scribam 04ad49de4d typos 2018-05-14 21:14:39 +04:00
kd-11 4836a03a7d rsx: Fix build 2018-05-13 14:44:14 +03:00
kd-11 9d1f4a2538 vk: Include RADV POLARIS and RADV VEGA in the primitive restart
blacklist
2018-05-13 14:44:14 +03:00
kd-11 bff6060bd6 rsx: Improve puller state management
- Properly identify puller spin primitives
- Add a small wake delay after exiting a spin delay. Fixes desynchronization
  It seems real hw has a small delay between cell edits to commandbuffer memory at the GET address and the changes becoming visible to the DMA puller
  Simulated with a short busy_wait, large values will improve sync but degrade performance
2018-05-13 14:44:14 +03:00
kd-11 1aa44ede31 gl: Improve AMD multidraw workaround
- Reimplements the AMD workaround using an identity buffer to avoid the performance hit of doing multiple glDrawArrays for every single compiled set
- Reimplements first/count allocation using a scratch buffer to reduce allocation overhead when large number of draw calls is used
2018-05-13 14:44:14 +03:00
kd-11 eccb57d4b8 vk: AMD primitive restart bug workaround
- Emulate primitive restart with degenerate triangles
2018-05-13 14:44:14 +03:00
kd-11 b7979d3f57 rsx/vk: Improvements and minor optimizations
- Improve dirty state tracking affecting program state
- vk: Refactor out transform constants upload into a separate channel to avoid if possible
  transform data uploads are quite expensive
2018-05-13 14:44:14 +03:00
kd-11 440a31ef18 rsx: Optimizations for program management 2018-05-13 14:44:14 +03:00
kd-11 a52ea7f870 rsx: Improve fragment and vertex program usage
- Introduces a gpu program analyser step to examine shader contents before attempting compilation or cache search
  - Avoids detecting shader as being different because of unused textures having state changes
  - Adds better program size detection for vertex programs
- Improved vertex program decompiler
  - Properly support CAL type instructions
  - Support jumping over instructions marked with a termination marker with BRA/CAL class opcodes
  - Fix SRC checks and abort
  - Fix CC register initialization
  - NOTE: Even unused SRC registers have to be valid (usually referencing in.POS)
2018-05-13 14:44:14 +03:00
Jake 75b40931fc rsx: initial capture/replay functionality (#4510)
* rsx: initial capture/replay functionality
2018-05-13 12:18:05 +03:00
Nekotekina 5d15d64ec8 Memory mirror support
Implemented utils::memory_release (not used)
Implemented utils::shm class (handler for shared memory)
Improved sys_mmapper syscalls
Rewritten ppu_patch function
Implemented vm::get_super_ptr (ignores memory protection)
Minimal allocation alignment increased to 0x10000
2018-05-09 23:35:34 +03:00
kd-11 98b715d8c8 gl: Workaround for AMD driver bug 2018-04-25 19:14:36 +03:00
kd-11 ffa62918aa gl: Improve pixel transfer code and notify on AMD driver bug
- Readback does not work at all with float textures on AMD openGL
  Driver throws a bogus OUT_OF_MEMORY error regardless of amount of VRAM and system RAM available
2018-04-25 19:14:36 +03:00
kd-11 f3210a9a33 rsx: Workaround for lost memory sections
- TODO: surface_cache and texture_cache need a better method of persisting partial framebuffer resources
2018-04-25 19:14:36 +03:00
kd-11 58035697d5 rsx: Restore component mapping override for depth textures 2018-04-25 19:14:36 +03:00
kd-11 7e32e7343a vk: Reorganize handling of formats support
- Formats support is linked to the physical device and by extension the logical device derived from it
  It therefore makes no sense to track this as a separate object.
  Simplifies parameter passing and template specialization.
  Also avoids corner cases with AMD hardware (where D24S8 is not supported)
2018-04-25 19:14:36 +03:00
kd-11 291a828217 fixups 2018-04-25 19:14:36 +03:00
kd-11 40ae5e605d vk: Fix border color selection 2018-04-25 19:14:36 +03:00
kd-11 c5d1f30e82 rsx: Fix performance counters
- Detect jump-to-self type idling
2018-04-25 19:14:36 +03:00
kd-11 91a6091d26 rsx: Minor fixes
- vk: Clear dirty textures before copying 'old contents' in case the old data does not fill the new region
- rsx: Properly decode border color - seems to be in BGRA format
- vk: better approximation of border color to better choose between the presets
- vk: Individually clear color images outside render pass and without scissor
- vk: Fix renderpass selection for clear overlay pass
- vk: Include scissor region when emulating clear mask

NOTES:
- vk: Completely avoid using vkClearXXXXimage - its 'broken' on nvidia drivers
  Spec is vague about the function so its not an actual bug
  ClearAttachment is clearly defined as bypassing bound state which works correctly
- TODO: Implement memory sampling to simulate loading precleared memory if cell used memset to preinitialize the framebuffer
  Autoclear depth to 1|255 and color to 0 is hacky!
2018-04-25 19:14:36 +03:00
kd-11 da99f3cb9a rsx: Critical fixes
- texture cache: Avoid leaking memory sections
  - Avoid double ref increment on flush-always reprotection
  - Detect invalidated_resources entries in surface cache when protecting fbo memory
- vk: Copypasta bugfix, properly initialize aspect mask
2018-04-25 19:14:36 +03:00
kd-11 a42b00488d rsx: Texture fixes
- gl/vk: Fix subresource copy/blit
- gl/vk: Fix default_component_map reading
- vk: Reimplement cell readback path and improve software channel decoder
- Properly name the subresource layout field - its in blocks not bytes!
- Implement d24s8 upload from memory correctly
- Do not ignore DEPTH_FLOAT textures - they are depth textures and abide by the depth compare rules
- NOTE: Redirection of 16-bit textures is not implemented yet
2018-04-25 19:14:36 +03:00
kd-11 63d9cb37ec rsx: Framebuffer fixes
Primary:
- Fix SET_SURFACE_CLEAR channel mask - it has been wrong for all these
  years! Layout is RGBA not ARGB/BGRA like other registers

Other Fixes:
- vk: Implement subchannel clears using overla pass
- vk: Simplify and clean up state management
- gl: Fix nullptr deref in case of failed subresource copy
- vk/gl: Ignore float buffer clears as hardware seems to do
2018-04-25 19:14:36 +03:00
kd-11 9abbbb79ae rsx: Blit engine fixes
- Ignore unlocked blit sections [TODO]
- Do not attempt blit on hw if bytesize is unsupported
- gl: Implement typeless memory transfers
  Uses pbo to handle type-agnostic memory transfer
2018-04-25 19:14:36 +03:00
kd-11 bb5622401c overlays/gl: minor fixes
- fix ogl color map for overlay resources
- fix label background for save dialog
2018-04-25 19:14:36 +03:00
kd-11 6d46ac1ad6 gl: Reimplement textures
- Separate texture data from texture views
2018-04-25 19:14:36 +03:00
kd-11 cf1b700ebc rsx: Improve format mismatch detection hack 2018-04-25 19:14:36 +03:00
kd-11 c5cd758700 rsx: Workaround for G8B8 render targets
- Mainly affected are colormasks and read swizzles

NOTES:
- Writes to G write to the second and fourth component (YW)
- Writes to B write to first and third component (XZ)
- This means the actual format layout is BGBG (RGBA) making RG mapping actually GR
- Clear does not seem to have any intended effect on this format (TLOU)
2018-04-25 19:14:36 +03:00
JohnHolmesII 7303f04bc5 Minor bugfix 2018-04-10 15:06:56 +03:00
Talkashie 64992f758d Fix typos (#4410)
* MASSIVE TYPO FIX part 1

* ANOTHER HUUUUGE TYPO FIX part 2

* thank you :hcorion: for all of your help. I could not have done this without you
2018-04-08 01:01:39 +01:00
kd-11 568118634e vk: Squash some spec violations that went unnoticed 2018-04-05 01:06:50 +03:00