Commit graph

534 commits

Author SHA1 Message Date
kd-11 e0a7912d7c rsx: Check for stencil writes when determining zeta_write flag 2019-08-30 21:45:41 +03:00
kd-11 04c808b8ab rsx: Fixup for MRT color write lookup and surface_target_a 2019-08-28 16:12:10 +03:00
kd-11 2962e05f26 rsx: Implement per-RTT color masks
- Also refactors and simplifies some common code in surface store and rsx core
2019-08-27 21:59:02 +03:00
Nekotekina d2eba2387b Use g_fxo for display_manager 2019-08-27 03:50:15 +03:00
Nekotekina 38a06c4b14 Use g_fxo for SysRsxConfig
Rename to lv2_rsx_config
2019-08-27 03:50:15 +03:00
kd-11 3e28e4b1e0 rsx/decompiler: Restructure program register behavior
- Fix reading of varying registers in FP
  Different registers have different behavior
- Always write to varying registers. If a register is not written to, it is initialized to (0, 0, 0, 1)
- Reimplements two-sided lighting correctly without hacks
- Also bumps shader cache version
2019-08-26 20:03:31 +03:00
kd-11 f9aea076ae rsx: Implement depth_buffer_float support.
- Since this is transparent to the application at all time, it only becomes a problem when doing memory transfer or DEPTH->RGBA conversion in shaders.
2019-08-26 20:03:31 +03:00
kd-11 9d981de96d rsx: Fix offloader deadlock
- Do not allow offloader to handle its own faults. Serialize them on RSX instead.
  This approach introduces a GPU race condition that should be avoided with improved synchronization.
- TODO: Use proper GPU-side synchronization to avoid this situation
2019-08-25 22:09:20 +03:00
kd-11 f0bd0b5a7c rsx: Conditional render sync optimization
- ZCULL queue was updated to one-per-cb but the conditional render sync hint was not updated.
- Do not unconditionally flush the queue unless the upcoming ref is contained in the active CB.
- This avoids spamming queue flush, which frees up resources and improves performance
2019-07-30 21:13:42 +03:00
Eladash 230c3d55b6 Fixup 2019-07-27 04:03:29 +01:00
Eladash fcc75c8b0f rsx: Write atomically semaphore updates and fix zcull timestamp 2019-07-26 21:27:55 +03:00
Eladash c53f0dd7b5 rsx: Fix gcm unmap events 2019-07-26 21:27:55 +03:00
Eladash 85b1152e29 Timers scaling and fixes 2019-07-23 00:09:01 +01:00
kd-11 9a7c2784f0 rsx: Do not clip scissor to viewport when doing buffer clear 2019-07-20 16:39:32 +03:00
kd-11 e2574ff100 rsx: Support CSAA transparency without multiple rasterization samples enabled 2019-07-19 15:49:08 +03:00
kd-11 b5a2f0df68 rsx: Implement separate viewport raster clipping
- Merge viewport raster window and scissor into one clipping region
- Viewport raster clip is different from viewport geometry clipping in
hardware as the latter is configurable separately
2019-07-19 14:21:19 +03:00
Eladash c4d8ef4340 rsx: Allow to configure vblank rate
Removed "HLE protection" hack from sys_rsx_context_attribute
2019-07-12 00:19:56 +03:00
kd-11 d8f753f1e8 rsx: Do not allow framebuffer surfaces that exceed their allocated pitch dimensions
- Truncate surfaces to forcefully fit inside the declared region
2019-07-11 13:22:13 +03:00
kd-11 219a5382f7 rsx: If no array streams are enabled, mark inline array as disabled (null render) 2019-07-09 16:27:59 +03:00
Eladash 43f919c04b Fixup after #6143 (#6146)
vm::spu max address was overflowing resulting in issues, so cast to u64 where needed. Fixes #6145.
    Use vm::get_addr instead of manually substructing vm::base(0) from pointer in texture cache code.
    Prefer std::atomic_thread_fence over _mm_?fence(), adjust usage to be more correct.
    Used sequantially consistent ordering in semaphore_release for TSX path as well.
    Improved memory ordering for sys_rsx_context_iounmap/map.
    Fixed sync bugs in HLE gcm because of not using atomic instructions.
    Use release memory barrier in lwsync for PPU LLVM, according to this xbox360 programming guide lwsync is a hw release memory barrier.
    Also use release barrier where lwsync was originally used in liblv2 sys_lwmutex and cellSync.
    Use acquire barrier for isync instruction, see https://devblogs.microsoft.com/oldnewthing/20180814-00/?p=99485
2019-06-29 18:48:42 +03:00
Eladash 1ee7b91646 Refactoring (#6143)
Prefer vm::ptr<>::ptr over vm::get_addr.
    Prefer vm::_ptr/base over vm::g_base_addr with offset.
    Added methods atomic_t<>::bts and atomic_t<>::btr .
    Removed obsolute rsx:🧵:Read/WriteIO32 methods.
    Removed wrong check in semaphore_release.
    Added handling for PUTRx commands for RawSPU MFC proxy.
    Prefer overloaded methods of v128 instead of _mm_... in VPKSHUS ppu interpreter precise.
    Fixed more potential overflows that may result in wrong behaviour.
    Added io/size alignment check for sys_rsx_context_iounmap.
    Added rsx::constants::local_mem_base which represents RSX local memory base address.
    Removed obsolute rsx:🧵:main_mem_addr/ioSize/ioAddress members.
2019-06-29 01:27:49 +03:00
JohnHolmesII 23094b48bb Fix warnings related to -Wswitch
Add default cases.
Move default breaks to newline
Add proper handling in some instances.
Add missing enums to switches
2019-06-28 01:40:52 +03:00
kd-11 4ff77a8555 rsx: Improve balancing of the offloader thread
- Use two counters to avoid atomic operations
- Yield instead of sleeping because some games are very sensitive to timing
2019-06-25 20:50:54 +03:00
kd-11 d26b25816d rsx: Improve profiling setup
- Avoid spamming QPC when not needed
- Free performance when debug overlay is not enabled
2019-06-25 20:50:54 +03:00
kd-11 b893a75002 rsx: Rework RSX offloading
- Use a lockless queue
- Do not enqueue small transfers
2019-06-25 20:50:54 +03:00
kd-11 0fa3bcc336 rsx: Asynchronous data transfer 2019-06-25 20:50:54 +03:00
Lassi Hämäläinen c963c51a60 Remove unnecessary header includes
- Manually removed lot of unneeded #includes to clean code and reduce
  compilation time
- Reordered some of the #includes to be in more logical order
2019-06-25 17:11:10 +03:00
Eladash cd0ef99df5 Fix BE endianess arch support in semaphore_406e (#6116)
Add raw() methods for endianness support types and make use of it.
2019-06-21 19:29:49 +03:00
kd-11 bca5f94b3f rsx: Add option to toggle MSAA 2019-06-14 16:19:52 +03:00
kd-11 4a5bbba277 rsx: Enable MSAA
- vk: Enable depth buffer resolve+unresolve
- vk: Add AMD stenciling extension support
- rsx: Temporarily disables MSAA-compatible hacks such as transparency AA
- TODO: Add paths to optionally disable MSAA
2019-06-14 16:19:52 +03:00
kd-11 0d906d6974 rsx: Remove surface aa_mode hacks 2019-06-14 16:19:52 +03:00
scribam 13671d9684 rsx: Apply Clang-Tidy fix "modernize-loop-convert" + const when relevant 2019-06-12 15:11:52 +03:00
scribam 635695ac78 rsx: Apply Clang-Tidy fix "modernize-use-emplace" 2019-06-12 15:11:52 +03:00
scribam a555504142 rsx: Apply Clang-Tidy fix "modernize-deprecated-headers" 2019-06-12 15:11:52 +03:00
scribam db926ee671 rsx: Apply Clang-Tidy fix "performance-unnecessary-value-param" 2019-06-12 15:11:52 +03:00
scribam 81a3b49c2f rsx: Apply Clang-Tidy fix "readability-container-size-empty" 2019-06-12 15:11:52 +03:00
Nekotekina dfd50d0185 Implement std::bit_cast<>
Partial implementation of std::bit_cast from C++20.
Also fix most strict-aliasing rule break warnings (gcc).
2019-06-02 23:22:16 +03:00
scribam 78c7ef3039 rsx: Use clear() instead of resize(0)
The result is the same but clear [1] has slightly less code than resize [2] and signals better the intent IMHO.

[1] fb7fb646fa/libstdc%2B%2B-v3/include/bits/stl_vector.h (L1495)
[2] fb7fb646fa/libstdc%2B%2B-v3/include/bits/stl_vector.h (L934)
2019-06-01 22:59:23 +03:00
kd-11 463b1b220d rsx: Improve accuracy of shadow compare Ops when non-integer depth formats are used
- The fixed-point D24S8 format does special Z clamping during compare which matches PS3 behaviour
- D32S8 is a floating point format and comparison with Dref > 1 always fails causing black edges/borders
2019-04-25 16:23:05 +03:00
eladash 6f76e34104 rsx: Fix race on clearing native_ui vs emu_requested flag 2019-04-20 01:04:41 +03:00
eladash 888cb9d673 Remove reader_lock executed in every instruction by RSX
Use optimistic double check instead, use one load instruction for the check to be atomic

+ Read emu status once every FIFO iteration
2019-04-20 01:04:41 +03:00
kd-11 0f7af391d7 vk: Implement copy-to-buffer and copy-from-buffer for depth_stencil
formats
- Allows D24S8 and D32S8 transport via typeless channels
- Allows uploading and downloading D24S8 data easily
- TODO: Implement optional byteswapping to fix flushed readbacks with
the same method
2019-04-09 13:40:54 +03:00
eladash 8185ef7610 rsx: Improve vblank accuracy 2019-03-31 14:57:21 +03:00
kd-11 f4ebcb0029 rsx: Properly decode packed renders from the type flag
- Seems to occupy bits [8-9]
2019-03-10 16:09:05 +03:00
eladash 6f770c8e35 Fix potential crash in begin_occlusion_query() while closing the Emu 2019-01-30 18:44:29 +03:00
kd-11 fb778e4821 rsx: Reimplement attrib divisor 2019-01-25 14:34:22 +03:00
kd-11 6fdc0fd7f0 rsx: Reimplement MSAA transparency
- Apply dither to edges that almost fail the straight-up alpha test
- Significantly improves alpha tested geometry far from the camera
- Also removes blend factor overrides/hacks as they give incorrect results due to background bleeding
2019-01-25 14:34:22 +03:00
kd-11 0f64583c7a rsx: Reimplement pitch lookup
- Remove the required_xxx_pitch constraint as it makes no sense. The pitch controls what can be written per line.
- It is possible to have a huge surface width but only render to a small region at the beginning and have a smaller pitch than can fit the surface (NFS carbon)
2019-01-06 10:44:40 +03:00
kd-11 64a8829614 rsx: Minor cleanup 2019-01-06 10:44:40 +03:00
kd-11 15488eb247 rsx: Avoid unnecessarily touching framebuffer memory
- Do not bind companion framebuffer when clearing single aspect; let the
  contest mechanism sort it out instead
- Do not prematurely tag framebuffers, instead only do so at
  write-confirmation time. Should avoid false tagging if setup does not
  allow a render to occur.
2019-01-06 10:44:40 +03:00
eladash db784556aa rsx: Evaluate cond render test at set_render_enabled 2018-12-30 15:04:59 +01:00
kd-11 f48abde14b rsx: Fixups for immediate rendering mode
- Immediate mode is isolated from the rest of the vertex configuration
- TODO: Verify register behaviour when immediate mode is used
  Check if per-primitive const register values are supported (likely are)
2018-12-24 09:05:19 +03:00
eladash 45ed58cdaf Fix rsx capture replay
Allow to capture non-increment cmd flag that was missing in command.reg
2018-12-15 19:40:18 +03:00
eladash 87988e9da8 rsx fifo: Stability improvements
* Restore stack in fifo error handling

* Update get register after the cmd execution

* Fix put pause in the middle of command

* Add restore points when branching to self

* Precise nopcmd detection

* Test all invalid cmds for early treatment of queue corruption
2018-12-15 19:40:18 +03:00
eladash 415b995a54 log rsx get ctrl 2018-12-15 19:40:18 +03:00
Nekotekina 476090a747 Detach VBlank and RSX Decompiler threads
Should fix exception handling in RSX Thread
2018-12-04 23:41:54 +03:00
kd-11 ec768afbd9 rsx: Flip workarounds for applications that flip via syscall
- Do not assume flip marks end-of-frame if executed via syscall
- Also disables skip_frame for these applications as there is no frame boundary
- NOTE: QUEUE_HEAD cannot be relied on as it is seemingly possible to flip the same head and not need to queue it
2018-11-30 23:51:25 +03:00
kd-11 2168159d03 gl: Fix flip regression
- Restore graphics state after flip (including active fbo) because flip can be made through a syscall
2018-11-30 23:51:25 +03:00
kd-11 5b6e1420f3 rsx: Pipeline barriers fixed up
- Ensure barriers are invoked even if no draw occurs!
-- Ensures that deferred commands are executed eventually
2018-11-30 23:51:25 +03:00
kd-11 1d19f71a46 rsx: Re-enable fifo error reset 2018-11-30 23:51:25 +03:00
kd-11 718a04c84f fixup: Clear disabled attrib entries 2018-11-30 23:51:25 +03:00
kd-11 833c25894f [WIP] rsx: Rebase cleanup 2018-11-30 23:51:25 +03:00
kd-11 5193c99973 rsx: Enable dynamic FIFO preprocessing
- Tries to detect when FIFO preprocessing is beneficial and only enables optimizations if the benefit outweighs the cost
- Current threshold is at least 500 draw calls saved at over 2000 draw calls to justify the overhead
- TODO: More tuning for other CPUs
2018-11-30 23:51:25 +03:00
kd-11 7b065d7781 rsx: Fixup; input attributes blob decoding
- Use an unstructured blob and index into the vec4 structures to extract the real data
2018-11-30 23:51:25 +03:00
kd-11 846daadd5d rsx: Fixups
- Improve vertex attribute layout format. Allows for full 16-bit attribute divisor
- Use actual pitch when declaring framebuffer rsx pitch instead of register value in case of swizzle? rendering
2018-11-30 23:51:25 +03:00
kd-11 2e32777375 rsx: Scrap the prebuffered queue approach
- Basically starting over
- The cost of making command copies into the queue has a measurable impact
2018-11-30 23:51:25 +03:00
kd-11 1ad76ad331 rsx: Restructure programs
- Also re-enable pipeline optimizations
2018-11-30 23:51:25 +03:00
kd-11 677b16f5c6 rsx: Fixups
- Also fix visual corruption when using disjoint indexed draws

- Refactor draw call emit again (vk)

- Improve execution barrier resolve
  - Allow vertex/index rebase inside begin/end pair
  - Add ALPHA_TEST to list of excluded methods [TODO: defer raster state]

- gl bringup

- Simplify
  - using the simple_array gets back a few more fps :)
2018-11-30 23:51:25 +03:00
kd-11 e01d2f08c9 rsx: Refactor FIFO
- Removes fifo structures from common RSXThread
- Sets up a dedicated FIFO controller
- Allows for configurable queue optimizations
2018-11-30 23:51:25 +03:00
eladash 37b6afaf2c rsx: inlined array stride fix 2018-11-11 23:17:07 +03:00
eladash 75221a6078 rsx: Fix inlined vertex array validation 2018-11-04 22:57:18 +03:00
Megamouse d56c85fe01 RSX/Capture: fix filePath and remove strict mode check (#5283)
- Fixes regression introduced by kd-11 when merging in jarves' flip rework.
2018-10-27 13:06:50 +03:00
elad 6829fa0286 rsx: Improve inlined arrays (#5248)
* rsx: Implement register reads in inlined arrays

* rsx: Check for disabled streams in inlined arrays
2018-10-20 16:00:53 +03:00
Nekotekina 1b37e775be Migration to named_thread<>
Add atomic_t<>::try_dec instead of fetch_dec_sat
Add atomic_t<>::try_inc
GDBDebugServer is broken (needs rewrite)
Removed old_thread class (former named_thread)
Removed storing/rethrowing exceptions from thread
Emu.Stop doesn't inject an exception anymore
task_stack helper class removed
thread_base simplified (no shared_from_this)
thread_ctrl::spawn simplified (creates detached thread)
Implemented overrideable thread detaching logic
Disabled cellAdec, cellDmux, cellFsAio
SPUThread renamed to spu_thread
RawSPUThread removed, spu_thread used instead
Disabled deriving from ppu_thread
Partial support for thread renaming
lv2_timer... simplified, screw it
idm/fxm: butchered support for on_stop/on_init
vm: improved allocation structure (added size)
2018-10-19 22:22:35 +03:00
Nekotekina da6ce80f4f Make vm::get_super_ptr return contiguous memory
Cleanup RSX code complexity
2018-09-27 23:37:13 +03:00
eladash 72ba062b1a rsx: Fix pfifo ret opcode 2018-09-27 17:47:32 +03:00
eladash a47ebad24c rsx: Fix pfifo nop cmd 2018-09-27 17:47:32 +03:00
Nekotekina 306f95a9ae New named_thread template (preview)
Old class named_thread renamed to old_thread
It's too hard to move in a single commit
2018-09-27 14:04:16 +03:00
kd-11 6a9f234dc7 rsx: Fixup flip behaviour
- handle_emu_flip is very heavy, only fire
2018-09-26 19:41:50 +03:00
kd-11 f72157bcec rsx: Fix vertex attrib parsing 2018-09-25 22:03:35 +03:00
kd-11 a3d44b5e1f rsx: Cleanup changes for the flip patch 2018-09-24 16:44:02 +03:00
Jake 699eadc84f rsx: Move render flip from rsx queue command to flip command 2018-09-24 16:44:02 +03:00
Rui Pinheiro 35139ebf5d Texture cache cleanup, refactoring and fixes 2018-09-24 15:26:40 +03:00
Rui Pinheiro f3029b2b42 Change Cell->RSX map/unmap notifications
This allows for further flexibility on the RSX side, allowing us to fix
some bugs and crashes in later commits.
2018-09-24 15:26:40 +03:00
eladash e0a676a3fe rsx: Fix vertex arrays fetch with inlined draws 2018-09-24 13:25:05 +03:00
eladash e6b68b260a rsx: Improve FIFO mem faults handling
increase the delay between faults, reduce log spam by allowing the messages to stack up
2018-09-20 01:05:40 +03:00
eladash a8ea576b22 rsx/cellgcm: Implemet initialization registers reset 2018-09-20 01:05:40 +03:00
elad d24f9194f7 typo fix
shader's location is decremented by one to match cellGcm's constants.
2018-09-13 16:49:58 +03:00
eladash b9ad578b00 rsx: Add a default shader address state 2018-09-13 16:49:58 +03:00
scribam 4cb98014a2 rsx: tiny zcull optimizations 2018-09-13 12:43:40 +03:00
eladash efbd77deb4 rsx: dont silently ignore null shader address 2018-09-12 00:40:20 +03:00
Nekotekina ca5158a03e Cleanup semaphore<> (sema.h) and mutex.h (shared_mutex)
Remove semaphore_lock and writer_lock classes, replace with std::lock_guard
Change semaphore<> interface to Lockable (+ exotic try_unlock method)
2018-09-03 23:00:36 +03:00
kd-11 2e0ecb556c rsx: Possible fix for UB data type consistency 2018-09-03 18:24:20 +03:00
kd-11 6399833182 rsx: Fix endianness order when immediate mode register is updated, but used as register lookup
- Simplify the code by unifying all the register-backed memory
2018-09-03 18:24:20 +03:00
kd-11 c6e35706a3 vk: Support sw component swizzle decode because metal sucks 2018-08-23 22:54:56 +03:00
eladash 56d553f10d Rsx: fix cmd jump over put register 2018-08-22 12:20:31 +03:00
kd-11 dd21e43ed5 rsx: Force disable draw reordering when capturing a frame 2018-08-18 16:14:30 +03:00
kd-11 0f36e87010 zcull: Improve the delay algorithm to be more consistent
- Use proper time checking; depending on what is being done one 'tick' can
  be almost a millisecond long or several nanoseconds
- Avoid spamming the system timer unless necessary
2018-08-18 16:14:30 +03:00
kd-11 38191c3013 rsx: Avoid acquiring the vm lock; deadlock evasion
- A possible deadlock is still present if rsx is trying to get a super_ptr whilst the vm lock holder is in an access violation
  This patch makes this scenario very unlikely since each block need only be touched once
2018-08-18 16:14:30 +03:00
kd-11 373e02e91c rsx: Timestamp accuracy workaround 2018-08-18 16:14:30 +03:00