Commit graph

492 commits

Author SHA1 Message Date
kd-11 2163a59649 rsx: Typo fix 2019-01-25 14:34:22 +03:00
kd-11 fb778e4821 rsx: Reimplement attrib divisor 2019-01-25 14:34:22 +03:00
kd-11 736415fcd9 rsx/fp: Detect broken/NOP shaders automatically
- Do not compile body if the shader is of no consequence, leave as a passthrough shader
2019-01-25 14:34:22 +03:00
kd-11 6fdc0fd7f0 rsx: Reimplement MSAA transparency
- Apply dither to edges that almost fail the straight-up alpha test
- Significantly improves alpha tested geometry far from the camera
- Also removes blend factor overrides/hacks as they give incorrect results due to background bleeding
2019-01-25 14:34:22 +03:00
kd-11 417a2e6731 rsx: Refactor index buffers
- Index offset is ignored anyway and only used to calculate vertex attribute divisor index
- Specialized optimization for untouched xfer without primitive restart
2019-01-25 14:34:22 +03:00
Nekotekina bd9131ae1c Implement fs::get_cache_dir
Win32: equal to config dir for now
Linux: respect XDG_CACHE_HOME if specified
OSX: possibly incomplete
2019-01-13 14:45:36 +03:00
kd-11 52ac0a901a rsx: improve memory coherency
- Avoid tagging and rely on read/write barriers and the dirty flag mechanism. Testing is done with a weak 8-byte memory test
- Introducing new data when tagging breaks applications with race conditions where tags can overwrite flushed data
2019-01-06 10:44:40 +03:00
kd-11 89c9c54743 rsx: Minor hot-fix
- Pitch 0 makes sense if width == 1 and height == 1
2019-01-06 10:44:40 +03:00
kd-11 2a62fa892b rsx: Texture cache refactor
- gl: Include an execution state wrapper to ensure state changes are consistent. Also removes a lot of required 'cleanup' for helper methods
- texture_cache: Make execition context a mandatory field as it is required for all operations. Also removes a lot of situations where duplicate argument is added in for both fixed and vararg fields
- Explicit read/write barrier for framebuffer resources depending on
  usage. Allows for operations like optional memory initialization before
  reading
2019-01-06 10:44:40 +03:00
kd-11 3be4b474d9 rsx: Handle rsx-self-tripping in draw call and triggering invalid invalidation
- If draw call resources consume memory that intersects with NA parts of the texture cache, we get a framebuffer test mismatch.
  This mismatch is false and happens because the thread has not yet reached the point of relocking the pages
2019-01-06 10:44:40 +03:00
kd-11 a95a44cf66 rsx: Strictness cleanups
- Also account for variable pitch textures (swizzled scan)
2019-01-06 10:44:40 +03:00
kd-11 362eea09a1 whitespace fix only 2019-01-06 10:44:40 +03:00
kd-11 15d5507154 rsx: Rewrite memory inheritance transfers
- Implicitly invoke a memory barrier if actively reading from an unsynchronized texture
- Simplify memory transfer operations
- Should allow more games to work without strict mode
2019-01-06 10:44:40 +03:00
kd-11 97704d1396 rsx: Fix texture size calculations 2019-01-06 10:44:40 +03:00
kd-11 50c07833e4 rsx: Do not force upload for missing data
- TODO: Finish implementing GPU RCB for mem-sync
- TODO: Refactor mem-sync
2019-01-06 10:44:40 +03:00
kd-11 15488eb247 rsx: Avoid unnecessarily touching framebuffer memory
- Do not bind companion framebuffer when clearing single aspect; let the
  contest mechanism sort it out instead
- Do not prematurely tag framebuffers, instead only do so at
  write-confirmation time. Should avoid false tagging if setup does not
  allow a render to occur.
2019-01-06 10:44:40 +03:00
Megamouse bb464b0b64 fix some warnings 2019-01-05 04:03:18 +01:00
kd-11 7555be232f rsx/vp: Fix double dst commands
- Test the vec_result mask before assigning to actual output
  Sometimes, VEC op is used to write to Rx, and SCA op is used to write to o[x]!
2018-12-24 09:05:19 +03:00
kd-11 4b79ef1ad9 rsx: Implement stencil mirror views
- Implements a mirror view of D24S8 data that accesses the stencil components.
  Finishes the implementation of TEX2D_DEPTH_RGBA as the stencil component was previously missing from the reconstructed data
- Add a few missing destructors
  Image classes are inherited a lot and I forgot to make the dtors virtual
2018-12-24 09:05:19 +03:00
kd-11 696b91cb9b rsx: Reimplement conditional execution in shaders
- Per-channel conditional execution introduces RAW hazards all over the place
- Its cheaper to process both branches and select between the two
- Also improves ShaderVariable functionality to allow functionality such as match_size and taking complex variables as inputs
2018-12-24 09:05:19 +03:00
Rui Pinheiro 54bfe2e102 Add log warning on slow flush path 2018-12-11 22:37:10 +03:00
Rui Pinheiro 18b9ee4541 Reimplement overlapping fbo "hack"
To avoid the need (and performance hit) of Read Color/Depth Buffers, we
may not invalidate overlapping fbos inside lock_memory_region unless
they are guaranteed to be superseded by the new one.

This avoids e.g. issues with overblooming, among others.
2018-12-11 22:37:10 +03:00
Rui Pinheiro 5ab7296665 Fix xcode build 2018-12-11 22:37:10 +03:00
Rui Pinheiro bcdf91edbb Misc. Texture Cache fixes 2018-12-11 22:37:10 +03:00
Rui Pinheiro 9d1cdccb1a Implement dedicated texture cache predictor 2018-12-11 22:37:10 +03:00
Rui Pinheiro af360b78f2 Texture cache section management fixups
Fixes VRAM leaks and incorrect destruction of resources, which could
lead to drivers crashes.

Additionally, lock_memory_region is now able to flush superseded
sections. However, due to the potential performance impact of this
for little gain, a new debug setting ("Strict Flushing") has been
added to config.yaml
2018-12-11 22:37:10 +03:00
eladash 4ddafc481e remove unreachable code 2018-12-04 13:01:29 +03:00
kd-11 504ab5a6d4 rsx: Minor cleanup to silence stupid compiler warnings 2018-12-03 20:01:23 +03:00
kd-11 7b065d7781 rsx: Fixup; input attributes blob decoding
- Use an unstructured blob and index into the vec4 structures to extract the real data
2018-11-30 23:51:25 +03:00
kd-11 846daadd5d rsx: Fixups
- Improve vertex attribute layout format. Allows for full 16-bit attribute divisor
- Use actual pitch when declaring framebuffer rsx pitch instead of register value in case of swizzle? rendering
2018-11-30 23:51:25 +03:00
kd-11 1ad76ad331 rsx: Restructure programs
- Also re-enable pipeline optimizations
2018-11-30 23:51:25 +03:00
kd-11 677b16f5c6 rsx: Fixups
- Also fix visual corruption when using disjoint indexed draws

- Refactor draw call emit again (vk)

- Improve execution barrier resolve
  - Allow vertex/index rebase inside begin/end pair
  - Add ALPHA_TEST to list of excluded methods [TODO: defer raster state]

- gl bringup

- Simplify
  - using the simple_array gets back a few more fps :)
2018-11-30 23:51:25 +03:00
kd-11 e01d2f08c9 rsx: Refactor FIFO
- Removes fifo structures from common RSXThread
- Sets up a dedicated FIFO controller
- Allows for configurable queue optimizations
2018-11-30 23:51:25 +03:00
eladash 83b6c98563 rsx: Fix u16 index arrays overflow
Force u32 index array destinations to avoid overflows when adding vertex base index.
2018-10-08 16:39:47 +03:00
eladash e361e0daa6 rsx: Fix restart index check for u16 index arrays
Dont ignore upper bits of the restart index with u16 types
2018-10-08 16:39:47 +03:00
eladash 348db050ae rsx: Fix texture height read 2018-10-03 20:57:46 +03:00
eladash fa723f6dc4 rsx: Fix texture depth read 2018-10-03 20:57:46 +03:00
eladash 6586090307 rsx: Remove texture size hack 2018-10-03 20:57:46 +03:00
eladash eacd1b8f13 rsx: Remove texture address hack 2018-10-03 20:57:46 +03:00
Nekotekina da6ce80f4f Make vm::get_super_ptr return contiguous memory
Cleanup RSX code complexity
2018-09-27 23:37:13 +03:00
kd-11 dab30c0051 rsx: Disable predictions if 50% of predictions are wrong
- This happens often in loading screens where the memory usage pattern is often randomized by loading in of assets
2018-09-24 21:19:38 +03:00
Rui Pinheiro 35139ebf5d Texture cache cleanup, refactoring and fixes 2018-09-24 15:26:40 +03:00
kd-11 dafc914bcc rsx: temporary hack
- Removes all use of valid_count as a metric until the new refactor is merged
2018-09-21 16:32:23 +03:00
kd-11 fc486a1bac rsx: Preserve memory order when doing flush
- Orders flushing to preserve memory at all cost
- Avoids false positive where flushing overlapping sections can falsely invalidate another with head/tail test
2018-09-21 16:32:23 +03:00
kd-11 a21bdb9f45 rsx; blit engine fixes
- Forcefully downloads and reuploads data from the CPU in case of unexpected overlaps
- Properly detect correct size of newly created blit targets
- Remember to clear any existing views when changing the default component map!
2018-09-21 16:32:23 +03:00
kd-11 16dcbe8c74 rsx/vp: Fix ARL opcode properly
- NOTE: The address swizzle index is only for use as src. The address registers are only used one channel at a time.
- When the destination of ARL, the encoding is the same as the other temp registers
2018-09-15 11:57:06 +03:00
kd-11 f413996362 rsx: Minor texture cache fixes
- Retag resources reprotected under flush_always rules
- Properly check for blit resource fitting taking into account format
mismatch, pitch mismatch and typeless transfers
2018-09-10 15:43:28 +03:00
Dzmitry Malyshau 27474316fd Add missing virtual desctructors (#5094) 2018-09-07 14:35:40 +03:00
kd-11 66610a28af rsx/common: Clean up shared glsl header to minimize string concat operations 2018-09-06 21:11:11 +03:00
kd-11 346b97f871 rsx: Preserve fog coordinate across shader stages
- The x value contains the VP output value interpolated across primitive surface
- The y coordinate contains the fog fraction according to the selected fog formula
2018-09-06 21:11:11 +03:00
scribam d7bb59cd99 c++17: use std::size 2018-09-06 13:15:59 +03:00
Nekotekina ca5158a03e Cleanup semaphore<> (sema.h) and mutex.h (shared_mutex)
Remove semaphore_lock and writer_lock classes, replace with std::lock_guard
Change semaphore<> interface to Lockable (+ exotic try_unlock method)
2018-09-03 23:00:36 +03:00
Nekotekina ce4c4696dd Try to get rid of SIZE_32 macro 2018-09-03 21:40:36 +03:00
kd-11 dea5193fd7 rsx: Fix FP temp register count 2018-09-03 21:39:18 +03:00
kd-11 6399833182 rsx: Fix endianness order when immediate mode register is updated, but used as register lookup
- Simplify the code by unifying all the register-backed memory
2018-09-03 18:24:20 +03:00
Nekotekina a93a40e8d9 Disable texture_cache::emit_once (MSVC crash) 2018-08-25 01:58:28 +03:00
Nekotekina 1c6c24f8ac Update GSL and yaml-cpp submodules 2018-08-25 01:15:47 +03:00
Nekotekina 923314aef5 Rewrite texture_cache::emit_once
Also trying to workaround MSVC bug
2018-08-25 01:15:47 +03:00
kd-11 c6e35706a3 vk: Support sw component swizzle decode because metal sucks 2018-08-23 22:54:56 +03:00
kd-11 f3d3a1a4a5 rsx: Rework section reuse logic 2018-08-22 17:22:54 +03:00
kd-11 937f1e8cd0 fix gcc build 2018-08-18 16:14:30 +03:00
kd-11 4b2b662c3a rsx: Followup to the memory inheritance hierachy patch
- Tags framebuffer resources on first use (when on_write is called to verify memory)
- Texture cache now selects the best match and even sorts atlas writes with memory write order to avoid older data showing over newer one
2018-08-18 16:14:30 +03:00
kd-11 cca488d0cf rsx: Enable swizzled decode for all formats unless proven otherwise
- Some formats are proven to ignore swizzle flag
  - DXT compressed textures
  - COMPRESSED_BG_GB class textures
- Some applications are using swizzled wide integer formats so those are confirmed to swizzle
2018-08-18 16:14:30 +03:00
kd-11 f8a9b1fa30 [WIP] rsx: Improve memory inheritance hierachy
- Cascade memory writes by invalidating 'downstream' subsurfaces
- Fixup; always resolve for overlapping surfaces before sampling (force
  atlas gather test)
2018-08-18 16:14:30 +03:00
kd-11 cc7848b3ef vulkan: Fix blit engine transfer to ARGB8 render target memory 2018-08-18 16:14:30 +03:00
kd-11 0267221586 Minor optimizations and fixes
- FIFO: avoid multiline spam
- VK: Fix program setup counter
- FS: Precalculate fragment constants buffer size during analysis step
2018-08-18 16:14:30 +03:00
Rui Pinheiro 23b52e1b1c Mark unsync textures dirty when deferred flushing
invalidate_range_impl_base does not mark all textures that will only be
unprotected as dirty when doing a deferred flush, since that is done by
flush_all.

However, if there are no sections to flush, the deferred flush will
use the same code path as non-deferred flushes for unprotecting textures
and forget to mark them as dirty.

This commit fixes this bug.
2018-08-16 15:38:36 +03:00
Rui Pinheiro fa6a5761b3 Refactor get_intersecting_set
The existing implementation restarts the loop immediately after
finding a range_data instance that updates the trampled_range.

This commit refactors this method to continue the loop with the updated
trampled_range, and then repeat only those range_data instances that
were iterated through before the trampled_range was last updated.
As a result, the number of total iterations required is reduced.
2018-08-16 15:38:36 +03:00
Rui Pinheiro b534d49e48 Fix off-by-one error in get_intersecting_set
When the trampled range changes, get_intersecting_set restarts the
outer loop. However, due to an off-by-one error, it skips the first
cache entry when doing so. This can cause a texture not to be
correctly unlocked, which could lead to issues or even deadlocks.

This commit fixes this off-by-one error.
2018-08-16 15:38:36 +03:00
eladash f349695a75 Rsx: rewrite address translation 2018-08-13 16:16:34 +03:00
kd-11 19d808d378 rsx/gl: Minor cleanup and optimization
- Track register change status
- Remove unused gl classes
2018-07-22 17:19:59 +03:00
kd-11 8695f95267 rsx: Reimplement cached textures and their views 2018-07-22 17:19:59 +03:00
scribam 65d270e5d8 clang-tidy: performance-faster-string-find
https://clang.llvm.org/extra/clang-tidy/checks/performance-faster-string-find.html
2018-07-15 12:51:09 +04:00
kd-11 e7f30640ef rsx: Async shader compilation
- Defer compilation process to worker threads
- vulkan: Fixup for graphics_pipeline_state.
  Never use struct assignment operator on vk** structs due to padding after sType member (4 bytes)
2018-07-14 15:19:56 +03:00
kd-11 fa55a8072c rsx: Improve vertex textures support
- Adds proper support for vertex textures, including dimensions other than 2D textures
- Minor analyser fixup, removes spurious 'analyser failed' errors
- Minor optimizations for program state tracking
2018-07-12 18:02:28 +03:00
kd-11 d78957d1cf rsx/vp: CodeGen improvements
- Fix double destination writes on conditional write masking
- Fix codegen to simplify simple scalar comparisons vs vector functions
2018-07-07 16:20:33 +03:00
kd-11 2c34195954 rsx/vp: Discard broken vertex programs with no writes to POS register 2018-07-07 16:20:33 +03:00
kd-11 2ca935a26b vp: Improve vertex program analyser
- Adds dead code elimination
- Fix absolute branch target addresses to take base address into account
- Patch branch targets relative to base address to improve hash matching
- Bumps shader cache version
- Enables shader logging option to write out vertex program binary,
  helpful when debugging problems.
2018-07-07 16:20:33 +03:00
kd-11 bd915bfebd rsx: vp decompiler fixes
- Fix program abort logic to never abort before resolving later label addresses
  Fixes jumping over broken code and jumping over END markers
- TEXTURE_CONTROL2 has indexing range of [0..15] without stride skipping!
  This register does not have interleaving with other texture registers
- Track shader address poke as it seems to invalidate programs as well
2018-07-07 16:20:33 +03:00
kd-11 24f4c92759 rsx: Improve texture cache read speculation 2018-06-26 20:07:20 +03:00
kd-11 1730708f47 rsx: Rework memory protection management for framebuffer access
- Avoid re-locking memory if there is no reason to do so (no draws issued)
- Actively bound regions should always get written to the backing cache
- Forcefully read memory during download if writes to the target have occured since last sync event
2018-06-26 20:07:20 +03:00
kd-11 d77e62c94e rsx: Improve GPU resource read prediction 2018-06-18 17:32:22 +03:00
kd-11 2afcf369ec vk: Add synchronous compute pipelines
- Compute is now used to assist in some parts of blit operations, since there are no format conversions with vulkan like OGL does
- TODO: Integrate this into all types of GPU memory conversion operations instead of downloading to CPU then converting
2018-06-18 17:32:22 +03:00
kd-11 dd4c13b625 rsx: Avoid race conditions in unsynchronized unprotect 2018-06-18 17:32:22 +03:00
kd-11 3150619320 rsx: Preserve read AA state separate from write AA state
- Some applications (e.g Backbreaker) use an evil hack to resolve MSAA.
  The application respecifies a formerly AA region as a region with no AA then performs a framebuffer feedback lookup.
  The old memory keeps AA during read, but writes back to itself with AA resolved.
  This is evil on several levels but it just happens to work on PS3
2018-06-08 22:17:50 +03:00
kd-11 0f24379c0e rsx: Obey MSAA resolve during memory persistence transfer
- Ugh. This is a bandaid on a festering wound, AA badly needs a rewrite

 Also silence some warnings
2018-06-08 22:17:50 +03:00
Dravonic 400079a006 Parallel shader cache loading (#4677)
* Parallel shader cache loading
2018-06-01 19:49:29 +03:00
kd-11 b030d1900c rsx: Fixup - fix broken memory protection fail caused by region respec
- Some applications will alternate memory between framebuffer and texture data
2018-05-24 10:36:04 +03:00
kd-11 f8d999b384 fixup - range check 2018-05-23 19:07:08 +03:00
kd-11 3f14bc6961 rsx: Silence some meaningless error 2018-05-23 19:07:08 +03:00
kd-11 8fcd5c1e5a rsx: Texture cache fixes
1. rsx: Rework section synchronization using the new memory mirrors
2. rsx: Tweaks
    - Simplify peeking into the current rsx::thread instance.
      Use a simple rsx::get_current_renderer instead of asking fxm for the same
    - Fix global rsx super memory shm block management
3. rsx: Improve memory validation. test_framebuffer() and
tag_framebuffer() are simplified due to mirror support
4. rsx: Only write back confirmed memory range to avoid overapproximation errors in blit engine
5. rsx: Explicitly mark clobbered flushable sections as dirty to have them
removed
6. rsx: Cumulative fixes
    - Reimplement rsx::buffered_section management routines
    - blit engine subsections are not hit-tested against confirmed/committed memory range
      Not all applications are 'honest' about region bounds, making the real cpu range useless for blit ops
2018-05-23 19:07:08 +03:00
scribam 04ad49de4d typos 2018-05-14 21:14:39 +04:00
kd-11 4836a03a7d rsx: Fix build 2018-05-13 14:44:14 +03:00
kd-11 b7979d3f57 rsx/vk: Improvements and minor optimizations
- Improve dirty state tracking affecting program state
- vk: Refactor out transform constants upload into a separate channel to avoid if possible
  transform data uploads are quite expensive
2018-05-13 14:44:14 +03:00
kd-11 a52ea7f870 rsx: Improve fragment and vertex program usage
- Introduces a gpu program analyser step to examine shader contents before attempting compilation or cache search
  - Avoids detecting shader as being different because of unused textures having state changes
  - Adds better program size detection for vertex programs
- Improved vertex program decompiler
  - Properly support CAL type instructions
  - Support jumping over instructions marked with a termination marker with BRA/CAL class opcodes
  - Fix SRC checks and abort
  - Fix CC register initialization
  - NOTE: Even unused SRC registers have to be valid (usually referencing in.POS)
2018-05-13 14:44:14 +03:00
kd-11 f3210a9a33 rsx: Workaround for lost memory sections
- TODO: surface_cache and texture_cache need a better method of persisting partial framebuffer resources
2018-04-25 19:14:36 +03:00
kd-11 291a828217 fixups 2018-04-25 19:14:36 +03:00
kd-11 da99f3cb9a rsx: Critical fixes
- texture cache: Avoid leaking memory sections
  - Avoid double ref increment on flush-always reprotection
  - Detect invalidated_resources entries in surface cache when protecting fbo memory
- vk: Copypasta bugfix, properly initialize aspect mask
2018-04-25 19:14:36 +03:00
kd-11 a42b00488d rsx: Texture fixes
- gl/vk: Fix subresource copy/blit
- gl/vk: Fix default_component_map reading
- vk: Reimplement cell readback path and improve software channel decoder
- Properly name the subresource layout field - its in blocks not bytes!
- Implement d24s8 upload from memory correctly
- Do not ignore DEPTH_FLOAT textures - they are depth textures and abide by the depth compare rules
- NOTE: Redirection of 16-bit textures is not implemented yet
2018-04-25 19:14:36 +03:00
kd-11 9abbbb79ae rsx: Blit engine fixes
- Ignore unlocked blit sections [TODO]
- Do not attempt blit on hw if bytesize is unsupported
- gl: Implement typeless memory transfers
  Uses pbo to handle type-agnostic memory transfer
2018-04-25 19:14:36 +03:00