rpcsx

mirror of https://github.com/RPCSX/rpcsx.git synced 2026-02-14 11:44:33 +01:00

Author	SHA1	Message	Date
Nekotekina	c0f80cfe7a	Use attributes for LIKELY/UNLIKELY Remove LIKELY/UNLIKELY macro.	2020-02-05 10:42:34 +03:00
Nekotekina	1a78e0e80c	Make RPCS3 compile in C++2a mode	2020-02-04 23:43:55 +03:00
kd-11	9d9b5c4d66	rsx: Rewrite coverage test to take sum of areas into account. - TODO: A proper sweep algorithm to calculate sum of overlapping rectangles	2020-02-04 16:20:52 +03:00
kd-11	b9ec012922	rsx: Allow for proper data checks when WCB/WDB is enabled	2020-02-04 16:20:52 +03:00
Silent	7f4e546f19	Protect m_storage.find(key) to fix a race	2020-02-02 22:28:14 +03:00
kd-11	7d2ed9200d	rsx: Remove sections that are wholly inherited by new blocks - Allows sections reclaimed by the surface store due to overlap/inheritance to be identified and removed. - Additionally, potentially lowers the number of flushes required per block with multiple overlaps improving efficiency and theoretically performance.	2020-02-01 15:14:29 +03:00
Nekotekina	15391f45d0	Modernize RSX logging (rsx_log variable)	2020-02-01 11:52:22 +03:00
kd-11	36d5db7f30	rsx: Plug texture data leak in the 'exact match' path. - Followup to previous texture data leak fix for the replaced section path.	2020-01-31 14:56:53 +03:00
kd-11	c9e35926f5	rsx: Preserve pixel data when splitting sections - Ironically rhis data leak is caused by trying to fix another type of data leak	2020-01-30 21:07:36 +03:00
kd-11	1206a5d4b7	rsx: Tweak blit engine heurestics a bit - Reject writes to RTT if the source data is of unknown origin. non-RTT data and only 1 line in length is suspicious and often GPU data like programs or other rendering inputs.	2020-01-29 12:54:06 +03:00
kd-11	79216917b3	rsx: Workaround for broken rtt resampling - Avoids WCB requirement for now to keep res scaling working correctly. - TODO: Fix this properly	2020-01-26 13:58:48 +03:00
kd-11	44f2cacf7b	rsx: Blit engine tuning - Attempt to identify blit operations that will be flushed immediately after and just do them on CPU instead if the transformation is trivial. - If only a single blit section is contributing to an atlas merge op, the threshold should be 100%. The only acceptable result here is a truncation.	2020-01-26 13:58:48 +03:00
kd-11	7a275eaa3a	rsx: Fix incomplete blit operations getting used as texture inputs - Raise passing 'score' from 50% to 90% to filter out very incomplete merge operations. - Catch unfit sections passing the match test; possible for blit_dst data but will likely be always harmless. Disabled in release builds by default.	2020-01-26 13:58:48 +03:00
Maksim Derbasov	1abdee242a	small improvement (#7288 ) * small improvement * comments addressed Co-authored-by: kd-11 <15904127+kd-11@users.noreply.github.com>	2020-01-22 12:28:48 +00:00
kd-11	db014d8a58	rsx: Fix section length calculations when generating new blit targets.	2020-01-16 17:57:31 +03:00
kd-11	309251ce7a	rsx: Touch locked dst memory after blit transfer operations in case it is locked by WCB/WDB	2020-01-16 11:12:08 +03:00
Dravonic	94d2f97f27	Multithreaded shader compliation follow-up (#7190 ) * Multithreaded load pipeline entries shader compliation stage Co-authored-by: kd-11 <15904127+kd-11@users.noreply.github.com>	2020-01-06 21:59:59 +03:00
kd-11	7f09def94e	rsx/vp: Properly initialize output registers. - All registers tested on hw show contents to be 0, 0, 0, 1. Make default output registers match this pattern.	2020-01-05 18:06:08 +03:00
Megamouse	c9aee27d48	VK: remove unused init function declaration	2020-01-03 14:22:40 +01:00
Eladash	9690854e58	Some cleanup * Prefer default initializer over std::memset 0 when possible and more readable. * Use std::format in trophy files name obtaining. * Use vm::ptr<>::operator bool() instead of comparing vm::ptr to vm::null or using addr(). * Add a few std::memset calls in hle where it matters (or in some places just to document an actual firmware memcpy call).	2019-12-31 22:27:27 +03:00
Megamouse	ef6f565dbd	silence some annoying warnings	2019-12-28 15:40:57 +01:00
Emmanuel Gil Peyrot	9b77febd10	RSX: Remove two empty cpp files	2019-12-23 00:02:57 +03:00
Eladash	db4041e079	Implement rounded_div Round-to-nearest integral based division, optimized for unsigned integral. Used in sceNpTrophyGetGameProgress. Do not allow signed values for aligned_div(), align().	2019-12-20 14:47:04 +03:00
Nekotekina	377e7d2a73	C-style cast cleanup VI	2019-12-04 17:56:22 +03:00
Nekotekina	185c067d5b	C-style cast cleanup V	2019-12-03 17:23:00 +03:00
Nekotekina	28eacc616a	C-style cast cleanup III	2019-12-01 00:32:44 +03:00
kd-11	8ca53f9c84	rsx: Remember to min-max the anchor indices of a polygon or triangle fan	2019-11-24 19:01:57 +03:00
kd-11	429a76a140	rsx: Remove redundant check	2019-11-23 16:11:18 +03:00
kd-11	41e7d2aa0a	rsx: Select correct image aspect for blit engine targets.	2019-11-19 13:18:15 +03:00
kd-11	41c3180276	rsx: Fix invalid format checks for DMA sections which are typeless	2019-11-19 13:18:15 +03:00
kd-11	9dab0575fa	rsx: Add missing format check for the RTV<->DSV transfer case - TODO: Rewrite resource handling routines	2019-11-18 13:17:00 +03:00
kd-11	4a0e1c79ed	rsx: Improve format validation for blit engine - Check all possible cases where format mismatch is possible. - Warn if a slow path is going to be taken. Should help with future optimizations.	2019-11-18 13:17:00 +03:00
kd-11	2408922806	rsx: Do not ignore clamping for some routines that do not have implied range	2019-11-18 13:17:00 +03:00
kd-11	0a32d478df	vk: Enable auto-growing of the data heaps for the performance case	2019-11-10 17:53:12 +03:00
kd-11	f359342721	rsx: Implement mutable ring buffers with grow support	2019-11-10 17:53:12 +03:00
Emmanuel Gil Peyrot	56f82d2701	rsx: Wrap gsl::span definition into Utilities/span.h	2019-11-09 20:00:50 +01:00
Emmanuel Gil Peyrot	f76720ceb0	Remove extraneous ::narrow<int>() calls GSL’s gsl::span didn’t use the correct type for its index_type, which is why they were needed.	2019-11-09 19:30:06 +01:00
Emmanuel Gil Peyrot	72cdf0b04c	Replace gsl::span’s implementation with tcbrindle’s This implementation optimises correctly on all relevant compilers, unlike GSL’s which gave extremely slow code on any compiler other than MSVC. Supersedes #6948.	2019-11-09 19:30:06 +01:00
Emmanuel Gil Peyrot	ef368c5171	rsx: Replace gsl::byte with C++17’s std::byte	2019-11-09 19:30:05 +01:00
kd-11	7072489a6e	rsx: Implement point sprite coordinate generation - When the point sprite flag is set, overrides the input similar to the 2D mask. The returned X and Y values are always the gl_PointCoord values for the fragment. - Stacks with the 2D mask to override the z and w coordinates.	2019-11-09 12:50:53 +03:00
kd-11	63673b1a9f	rsx: Implement full color remap for the D24S8->ARGB8 converter	2019-11-08 19:11:59 +03:00
kd-11	1266b63135	vk: Enable gpu deswizzling	2019-11-05 22:07:22 +03:00
kd-11	9cd3530c98	rsx: Set up framework for hw deswizzle	2019-11-05 22:07:22 +03:00
Nekotekina	e3e7051ed3	Minor optimization in BufferUtils.cpp Don't use PSHUFB for horizontal operations. Utilize PHMINPOSUW to compute max as well: + sse41_hmin_epu16 + sse41_hmax_epu16	2019-10-30 18:52:34 +03:00
Nekotekina	b1968769b7	Minor cleanup in BufferUtils.cpp Replace inline asm with intrinsic using target attribute trick.	2019-10-30 17:53:51 +03:00
linkmauve	cfd5cf6bdb	Optimise primitive_restart::upload_untouched() (#6881 ) * rsx: Optimise primitive_restart::upload_untouched() with SSE4.1 This optimisation is only applied when skip_restart is false. I’ve only tested the u16 codepath, as it is the one used in NieR. In some very unscientific profiling, this function used to take 2.76% of the total frame time at the save point of the port town, it now takes about 0.40%. * rsx: Mark all SSE4.1 functions with attributes on gcc and clang This assures the compiler we will take care of only calling these functions after having checked that the CPU does support these instructions. * rsx: Add an AVX2 implementation of primitive restart ibo upload * rsx: Remove redefinition of SSE4.1 instructions Now that clang is aware that our functions are compiled with SSE4.1, it lets us generate this code using its intrinsics. * rsx: Optimise vector to scalar conversion This is done using minpos and srli intrinsics and generate less code than before. Thanks Nekotekina for the suggestion!	2019-10-30 16:42:44 +03:00
kd-11	aa3eeaa417	rsx: Separate subresource_layout:dim_in_block and subresource_layout::dim_in_texel - These two are not always linked when working with compressed textures. The actual texels extend past the actual size of the image if the size is not aligned. e.g if height is 1, the real height is 4, but its not possible to determine this from the aligned size. It could be 1, 2, 3 or 4 for example. - Fixes image out-of-bounds writes when uploading from CPU	2019-10-29 20:03:54 +03:00
kd-11	d04241ad25	rsx: Allow compressed textures to be unaligned in size - Align based on row length but let the texture itself be of arbitrary dimensions	2019-10-28 15:20:45 +03:00
kd-11	e04b6cd7c0	rsx: Copypasta fix - r1 is always float4 never half4. Its a full-width register unlike the other outputs which are optionally half-width.	2019-10-23 00:50:24 +03:00
Eladash	945abcc6cd	rsx: Align down index array offset * Also use improved to_be_t<> template (recetly ignoring one byte long types) for vm gsl::byte referencing, remove redundent narrow<> cast (same type)	2019-10-22 13:45:09 +03:00
kd-11	0b2f9f0f17	rsx: Add support for delayed shader discard. - Noticed a glitch on AMD hw and windows drivers where discard seems to affect entire 4x4 cells. - Dead fragments (outside the primitive boundary) could have their discards trigger as they do not have proper access to variables. - This introduces dead fragments along triangle edges, causing a diagonal line pattern across the screen that is very annoying.	2019-10-22 13:44:49 +03:00
kd-11	901942f24a	rsx: Replace pointless f32[4] restriction on texture parameters. - Use a struct instead to improve readability and remove pointless OpBitCast	2019-10-22 13:44:49 +03:00
kd-11	f7842b765f	rsx: Implement packed format renormalization - Renormalizes arbitrary N-bit values as 8-bit normalized. - NV hardware performs integer normalization at 8 bits if the size is less than 8. - This can cause significant arithmetic drift because the error is multiplied by a huge number when sampling.	2019-10-22 13:44:49 +03:00
kd-11	09de3b7974	rsx: Tweak behaviour of the "Use GPU texture scaling" option - If either source data or dest is a render target, do image operations on the GPU same as before - If swizzle is desired, use CPU fallback - If no scaling and no format conversion is required, use CPU fallback - If scaling is desired and the transfer target is in local memory, use the GPU - When doing trivial copies, use the routine in rsx_methods instead of duplicating code. Also has the benefit of better range checking.	2019-10-20 21:38:40 +03:00
kd-11	868547aec8	rsx: Minor improvement to fbo region invalidation - When commiting a block as fbo, keep blit_dst data as well. - Avoids removing (and losing data from) blit targets that just happen to share a page with a framebuffer.	2019-10-20 21:38:40 +03:00
kd-11	996534c559	rsx: Fixup for aspect mismatch	2019-10-20 15:25:07 +03:00
kd-11	404073c74a	rsx: Force-align compressed formats to 4x4 texel blocks and disable 1D compressed textures. - The PS3 allows defining 1D compressed images but this obviously doesn't work well on desktop.	2019-10-18 14:46:37 +03:00
kd-11	eff4e95c99	rsx: Minor cache fixup for cyclic references. - Logic was broken by mipmaps PR. Do not issue a texture barrier if a temp copy is being done.	2019-10-18 14:46:37 +03:00
kd-11	eee2237e19	rsx: Track uncached cache resources - Uncacheable resources can be reused as soon as they're made visible to the draw call. - Since they're likely to be reused every draw call until the shader changes, it is important to reuse as much as possible	2019-10-18 14:46:37 +03:00
kd-11	decf9cfcf6	rsx: Notify the backend to release or delete temporary surfaces after we're done with them.	2019-10-18 14:46:37 +03:00
kd-11	a936e43ff6	rsx: Fixup for slice gathering for structures with multiple mipmap levels - TODO: Proper multi-level assembly for non-2D structures	2019-10-17 18:18:00 +03:00
kd-11	e166dbccc8	rsx: Fix visibility of blit destination targets	2019-10-17 18:18:00 +03:00
kd-11	0c35595ce2	rsx: Remove the alpha-to-coverage hack that was added to hide the missing mipmaps in games - Moves to a purely stochastic function using dithering to simlulate coverage	2019-10-17 18:18:00 +03:00
kd-11	f0ed0285f3	rsx: Implement range-based subresource descriptor cache - The previous address-based approach was pretty awful when it comes to invalidating	2019-10-17 18:18:00 +03:00
kd-11	fbb9ed4e25	rsx: Add explicit range to cached subresource descriptors	2019-10-17 18:18:00 +03:00
kd-11	c9e3a321b2	rsx: Fixup for surface cache scanning - Fix regression when gathering cubemaps	2019-10-17 18:18:00 +03:00
kd-11	1ac976771c	rsx: Add some texture search options for the cache - Potentially optimizes texture cache searching using explicit options	2019-10-17 18:18:00 +03:00
kd-11	840b52fe80	rsx: Implement mipmap gathering from texture cache	2019-10-17 18:18:00 +03:00
kd-11	d6d8766f8d	rsx: Refactoring - Move some helper routines out of the cache core - Prep for multi-layered image search	2019-10-17 18:18:00 +03:00
kd-11	4a19a2dd24	rsx: Explicity describe transfer regions for both source and destination blocks	2019-10-04 18:10:46 +03:00
kd-11	ef5b56bc48	rsx: Align width properly when normalizing to avoid fractional results being lowered to 0	2019-09-29 11:39:22 +03:00
kd-11	c59cb1bdd3	rsx: Allow only sse4.1 capable CPUs to take the accelerated index path - Older sets lack the required min/max functionality	2019-09-13 12:28:52 +03:00
kd-11	cc313b052f	rsx: Improve hit testing when scanning for overlapping surfaces - Calculate exact sizes when doing hit tests to avoid false negatives - Defer page checking until actually require to do memory setup - Introduce align2 helper to do non-pow2 alignments	2019-09-12 23:32:21 +03:00
kd-11	9842823a8c	rsx: Check if memory actually exists when overallocating blit targets	2019-09-12 23:32:21 +03:00
kd-11	cd1345b6bb	rsx: Do not use nul section if resolution scaling is active on a surface	2019-09-12 23:32:21 +03:00
kd-11	858014b718	rsx: Experiments with nul sink	2019-09-12 23:32:21 +03:00
kd-11	212ac19c11	vk: Reimplement DMA synchronization	2019-09-12 23:32:21 +03:00
kd-11	60845daf45	rsx: Improve use of CPU vector extensions - Allow use of intrinsics when SSSE3 and SSSE4.1 are not available in the build target environment - Properly separate SSE4.1 code from SSSE3 code for some older proceessors without SSE4.1	2019-09-12 14:08:21 +03:00
kd-11	27af75fe71	rsx: Fixup for blit engine when moving inverted regions - Properly calculate overlap range when sections are inverted - Simplify transfer logic for inverted regions	2019-09-11 23:30:55 +03:00
kd-11	412c620b9d	rsx: Allow sampling from shader_read resources for blit engine - With harmonization between all texture types implemented, there is no difference between blit_engine_src and shader_read for supported formats - Adds extra format filtering to ensure no conflicts when copying data	2019-09-10 16:54:02 +03:00
kd-11	75fcfac00e	rsx: Modify find_cached_texture to respect gcm_format. Can pass 0 for "dont care"	2019-09-10 16:54:02 +03:00
kd-11	f53361b966	rsx: Fix fast texture copy when src_pitch != width * block_size - Happens on mipmapped linear images	2019-09-08 18:22:27 +03:00
kd-11	0af9685381	rsx: Deprecate surface_transform::argb_to_bgra which is no longer required. - vulkan now uses native swizzle mapping for both surface and texture	2019-09-08 13:56:41 +03:00
kd-11	6aa0b49dbc	vk: Prefer using native alignment when uploading. - Allows using fast copy paths and reduces memory and compute footprint	2019-09-07 16:23:20 +03:00
kd-11	a3a0cb8c17	rsx: Minor texture optimizations	2019-09-07 16:23:20 +03:00
kd-11	efa501dac6	rsx/vp: Set default inputs to (0, 0, 0, 1) - From some hw tests, it seems this is the default.	2019-09-06 17:08:28 +03:00
kd-11	f8dbe281a5	glsl: Explicitly declare const inputs as such - Avoids copying the values to temp variables before invoking function calls - Generates shorter, cleaner AST and SPV bytecode	2019-09-06 17:08:28 +03:00
kd-11	9dc06cef7f	rsx: Do not include ro data when attempting to do section merge - Avoids crazy situations like trying to merge from a 3d or cubemap in memory	2019-09-02 16:49:04 +03:00
kd-11	e99e8460fe	rsx/texture_cache_utils: Warnings cleanup	2019-09-01 18:59:50 +03:00
kd-11	27fabd7607	rsx/ring_buffer: Warnings cleanup	2019-09-01 18:59:50 +03:00
kd-11	0158a88c88	rsx/textures: Warnings cleanup	2019-09-01 18:59:50 +03:00
kd-11	401bd9112a	rsx/prog: Warnings cleanup	2019-09-01 18:59:50 +03:00
kd-11	652f18ebaa	rsx/buffers: Warnings cleanup	2019-09-01 18:59:50 +03:00
kd-11	94656ac1e3	rsx/vp: Warnings cleanup	2019-09-01 18:59:50 +03:00
kd-11	0ee9d7b46d	rsx/fp: Warnings cleanup	2019-09-01 18:59:50 +03:00
kd-11	7f99de36c1	rsx: Fixup for surface_target_a flag being broken - While the mask for surface_a is at index 0, the surface cache expects the order to be maintained correctly! Set the correct mask since surface store now checks each RTT individually	2019-08-30 21:46:19 +03:00
kd-11	99fb6d6a5d	rsx: Allow GPU-accelerated stream manipulation when doing texture uploads	2019-08-30 21:46:19 +03:00
kd-11	e55d216619	rsx: Workarounds for some buggy games - Replace assert with log message until hardware testing confirms findings	2019-08-28 14:54:51 +03:00
kd-11	e334a43169	rsx: Fix surface cache hit tests - Avoid silly broken tests due to queue_tag being called before pitch is initialized. - Return actual memory range covered and exclude trailing padding. - Coordinates in src are to be calculated with src_pitch, not required_pitch.	2019-08-28 14:54:51 +03:00
kd-11	2962e05f26	rsx: Implement per-RTT color masks - Also refactors and simplifies some common code in surface store and rsx core	2019-08-27 21:59:02 +03:00
kd-11	27aeaf66bc	gl: Restructure buffer objects to give more control over usage - This allows creating buffers with no MAP bits set which should ensure they are created for VRAM usage only - TODO: Implement compute kernels to avoid software fallback mode for pack/unpack operations	2019-08-27 21:59:02 +03:00
kd-11	eed32cf3a4	rsx: Decompiler fixups and improvements - Fix 2D coordinate sampling of W coordinate. W is actually HPOS.w and not 1. Z is however always 0. - Optimize register usage a bit Disassembling compiled SPV shows that global declaration results in less ops than using inout modifiers. Modifiers generate extra mov instructions.	2019-08-26 20:03:31 +03:00
kd-11	3e28e4b1e0	rsx/decompiler: Restructure program register behavior - Fix reading of varying registers in FP Different registers have different behavior - Always write to varying registers. If a register is not written to, it is initialized to (0, 0, 0, 1) - Reimplements two-sided lighting correctly without hacks - Also bumps shader cache version	2019-08-26 20:03:31 +03:00
kd-11	fe6ff8622a	rsx: Decompiler fixups for conditional execution - Cond actually obeys vector mask	2019-08-26 20:03:31 +03:00
kd-11	f9aea076ae	rsx: Implement depth_buffer_float support. - Since this is transparent to the application at all time, it only becomes a problem when doing memory transfer or DEPTH->RGBA conversion in shaders.	2019-08-26 20:03:31 +03:00
kd-11	c67c97844e	rsx: Fixup for blit engine range calculations	2019-08-21 21:17:15 +03:00
kd-11	5d1b7eb945	rsx: Fix reference leaks in texture_cache<->surface_cache communication - Properly commit orphaned blocks not invalidating existing cache structures - Do not ignore overwritten objects when commiting as unprotected fbo. Avoids stale references to invalidated surface objects.	2019-08-21 21:17:15 +03:00
kd-11	141072023b	rsx: Fix handling of ARGB8 memory - Load into memory as straightforward BGRA - Fixes a bug in vulkan caused by byte shuffling in blit engine vs shader access - Removes the need for memory shuffling when transferring into a rendertarget	2019-08-21 21:17:15 +03:00
kd-11	9cd5325962	rsx: Free memory 'held hostage' by storage sections in the surface cache - Once the memory has been captured by another surface, release the allocation	2019-08-21 21:17:15 +03:00
kd-11	be98554b40	rsx: Fix surface split logic - Calculations are supposed to be done based on the properties of the outgoing surface	2019-08-21 21:17:15 +03:00
kd-11	67dac94704	rsx/fp: Zero-initialize FragDepth register to match hw	2019-08-21 21:17:15 +03:00
kd-11	dca29def5e	rsx: Temporary workaround for race condition in blit engine	2019-08-18 20:45:48 +03:00
kd-11	5e299111cc	rsx/vk: Restructure surface access barriers and implement RCB/RDB - Implements render target data load (aka Read Color Buffer/Read Depth Buffer) - Refactors vulkan surface barrier to be much cleaner. - Removes redundant surface barrier invocations after doing a merged load from surface cache. - Adds explicit access modes when gathering surfaces from cache.	2019-08-18 20:45:48 +03:00
kd-11	dfe709d464	rsx: Surface cache restructuring - Further improve aliased data preservation by unconditionally scanning. Its is possible for cache aliasing to occur when doing memory split. - Also sets up for RCB/RDB implementation	2019-08-18 20:45:48 +03:00
kd-11	1de90bdb1f	rsx: Improve aliased data preservation - Carve out inherited region if any - Perform pitch compatibility test before assigning old_surface	2019-07-27 16:09:21 +03:00
kd-11	e2574ff100	rsx: Support CSAA transparency without multiple rasterization samples enabled	2019-07-19 15:49:08 +03:00
kd-11	ea2f4d57fa	rsx: Fixups	2019-07-17 13:29:42 +03:00
kd-11	113a49e00c	rsx: Handle cyclic references when doing memory inheritance	2019-07-17 13:29:42 +03:00
kd-11	34b06453f9	rsx: Handle lost data due to unused data sections - After splitting, the sections may not be referenced at all for anything other than just pixel storage - In such cases, either merge down or sample from the upstream source instead	2019-07-17 13:29:42 +03:00
kd-11	009e01a347	rsx: Set up for multi-section inheritance	2019-07-17 13:29:42 +03:00
kd-11	fc09572648	rsx: Implement texel border decode - Texel borders are no longer actually supported in modern APIs - Removes the border texels and uses border color instead which is incorrect but should work fine	2019-07-11 13:22:13 +03:00
kd-11	d8f753f1e8	rsx: Do not allow framebuffer surfaces that exceed their allocated pitch dimensions - Truncate surfaces to forcefully fit inside the declared region	2019-07-11 13:22:13 +03:00
kd-11	c072c511a1	rsx: Add support for slice padding rows when gathering slices for cubemap/3d	2019-07-09 16:27:59 +03:00
kd-11	ad10eb391e	vk: Reuse discarded memory whenever possible instead of recreating new objects - Memory allocations are surprisingly expensive when spammed	2019-07-03 15:52:16 +03:00
kd-11	71e809a78b	rsx: Implement dma abort in case of a reset after misprediction	2019-07-03 15:52:16 +03:00
Eladash	43f919c04b	Fixup after #6143 (#6146 ) vm::spu max address was overflowing resulting in issues, so cast to u64 where needed. Fixes #6145. Use vm::get_addr instead of manually substructing vm::base(0) from pointer in texture cache code. Prefer std::atomic_thread_fence over _mm_?fence(), adjust usage to be more correct. Used sequantially consistent ordering in semaphore_release for TSX path as well. Improved memory ordering for sys_rsx_context_iounmap/map. Fixed sync bugs in HLE gcm because of not using atomic instructions. Use release memory barrier in lwsync for PPU LLVM, according to this xbox360 programming guide lwsync is a hw release memory barrier. Also use release barrier where lwsync was originally used in liblv2 sys_lwmutex and cellSync. Use acquire barrier for isync instruction, see https://devblogs.microsoft.com/oldnewthing/20180814-00/?p=99485	2019-06-29 18:48:42 +03:00
Eladash	1ee7b91646	Refactoring (#6143 ) Prefer vm::ptr<>::ptr over vm::get_addr. Prefer vm::_ptr/base over vm::g_base_addr with offset. Added methods atomic_t<>::bts and atomic_t<>::btr . Removed obsolute rsx:🧵:Read/WriteIO32 methods. Removed wrong check in semaphore_release. Added handling for PUTRx commands for RawSPU MFC proxy. Prefer overloaded methods of v128 instead of _mm_... in VPKSHUS ppu interpreter precise. Fixed more potential overflows that may result in wrong behaviour. Added io/size alignment check for sys_rsx_context_iounmap. Added rsx::constants::local_mem_base which represents RSX local memory base address. Removed obsolute rsx:🧵:main_mem_addr/ioSize/ioAddress members.	2019-06-29 01:27:49 +03:00
JohnHolmesII	ebb1ae6408	Properly ignore SIMD macros to avoid warning	2019-06-28 01:40:52 +03:00
JohnHolmesII	be521ff0ab	Fix warnings related to parentheses	2019-06-25 20:36:32 -07:00
kd-11	6a32f716db	rsx: Reimplement vertex layout streaming - Remove string comparisons from the hot-path! - Use attribute streaming and push constants to avoid forcing a descriptor block copy every other draw call/pass. While this isn't so bad on nvidia cards, it makes AMD cards a slideshow.	2019-06-25 20:50:54 +03:00
kd-11	358169507c	rsx: Use SSE to accelerate index buffer uploads	2019-06-25 20:50:54 +03:00
kd-11	c9501b60ab	rsx: Use explicit fma for MAD emulation	2019-06-25 20:50:54 +03:00
kd-11	6be7c58fa4	glsl: Refactoring, cleanup and optimizations - Avoid generating unused code - Reduce GPR usage in emitted code	2019-06-25 20:50:54 +03:00
Lassi Hämäläinen	c963c51a60	Remove unnecessary header includes - Manually removed lot of unneeded #includes to clean code and reduce compilation time - Reordered some of the #includes to be in more logical order	2019-06-25 17:11:10 +03:00
Lassi Hämäläinen	e9e87b8bd9	Add missing #includes to header files - Multiple header files where missing #includes to other headers that where used in the header. Correct header was included in correct order in source files which caused everything to compile. - Added missing #includes so header files correctly include all their dependencies and fixes problems with IDEs being unable to parse headers correctly due to missing symbols	2019-06-25 17:11:10 +03:00
kd-11	86119f58d6	rsx: Typo fix	2019-06-14 16:19:52 +03:00
kd-11	9d166c5bed	rsx: Force invalidate of children by issuing a resolve notification whenever the parent is written to - Fixes successive reads of an antialiased surface that is still bound between reads	2019-06-14 16:19:52 +03:00
kd-11	8a1cf2c913	rsx: Attempt to reduce stencil load overhead for nvidia cards	2019-06-14 16:19:52 +03:00
kd-11	c655036920	rsx/fp: Ease pressure on fragment shaders when emulating clamp16 - TODO: Option to completely skip clamping in some architectures as it is not needed in most games - Mostly affects older GPUs that do not have access to native fp16	2019-06-14 16:19:52 +03:00
kd-11	bca5f94b3f	rsx: Add option to toggle MSAA	2019-06-14 16:19:52 +03:00
kd-11	ea8409dcfd	rsx: Re-enable optional sample-to-pixel transformation	2019-06-14 16:19:52 +03:00
kd-11	acb14320da	rsx: Fixup for resolution scaling support	2019-06-14 16:19:52 +03:00
kd-11	4a5bbba277	rsx: Enable MSAA - vk: Enable depth buffer resolve+unresolve - vk: Add AMD stenciling extension support - rsx: Temporarily disables MSAA-compatible hacks such as transparency AA - TODO: Add paths to optionally disable MSAA	2019-06-14 16:19:52 +03:00
kd-11	f6f3b40ecc	rsx: Fix AA coordinate transforms - Requires native_pitch value to take samples into account	2019-06-14 16:19:52 +03:00
kd-11	655eff29e8	rsx: Refactoring and cleanup after d3d12 separation - Remove deprecated functionality - Refactor to share code between common routines	2019-06-14 16:19:52 +03:00
kd-11	0d906d6974	rsx: Remove surface aa_mode hacks	2019-06-14 16:19:52 +03:00
scribam	13671d9684	rsx: Apply Clang-Tidy fix "modernize-loop-convert" + const when relevant	2019-06-12 15:11:52 +03:00
scribam	1e327ad31b	rsx: Apply Clang-Tidy fix "readability-avoid-const-params-in-decls"	2019-06-12 15:11:52 +03:00
scribam	44265aa27d	rsx: Apply Clang-Tidy fix "modernize-use-equals-default"	2019-06-12 15:11:52 +03:00
scribam	635695ac78	rsx: Apply Clang-Tidy fix "modernize-use-emplace"	2019-06-12 15:11:52 +03:00
scribam	cba828384d	rsx: Apply Clang-Tidy fix "modernize-pass-by-value"	2019-06-12 15:11:52 +03:00
scribam	b91bcdbbca	rsx: Apply Clang-Tidy fix "modernize-use-bool-literals"	2019-06-12 15:11:52 +03:00
scribam	35dc98be06	rsx: Apply Clang-Tidy fix "readability-string-compare"	2019-06-12 15:11:52 +03:00
scribam	801fa0113f	rsx: Apply Clang-Tidy fix "readability-inconsistent-declaration-parameter-name"	2019-06-12 15:11:52 +03:00
scribam	8f2647555a	rsx: Apply Clang-Tidy fix "readability-redundant-string-init"	2019-06-12 15:11:52 +03:00
scribam	db926ee671	rsx: Apply Clang-Tidy fix "performance-unnecessary-value-param"	2019-06-12 15:11:52 +03:00
scribam	81a3b49c2f	rsx: Apply Clang-Tidy fix "readability-container-size-empty"	2019-06-12 15:11:52 +03:00
scribam	f9ad635856	rsx: TextGlyphs optimizations	2019-06-09 23:09:11 +01:00
Nekotekina	dfd50d0185	Implement std::bit_cast<> Partial implementation of std::bit_cast from C++20. Also fix most strict-aliasing rule break warnings (gcc).	2019-06-02 23:22:16 +03:00
scribam	09c9996f31	Use empty() instead of comparing size() with 0 Recommendation from Clang-Tidy: https://clang.llvm.org/extra/clang-tidy/checks/readability-container-size-empty.html	2019-06-01 22:59:23 +03:00
scribam	bf557ea6e6	Use the more efficient character literal overload for find_first_of/find_last_of Recommendation from Clang-Tidy: https://clang.llvm.org/extra/clang-tidy/checks/performance-faster-string-find.html	2019-06-01 22:59:23 +03:00
scribam	78c7ef3039	rsx: Use clear() instead of resize(0) The result is the same but clear [1] has slightly less code than resize [2] and signals better the intent IMHO. [1] `fb7fb646fa/libstdc%2B%2B-v3/include/bits/stl_vector.h (L1495)` [2] `fb7fb646fa/libstdc%2B%2B-v3/include/bits/stl_vector.h (L934)`	2019-06-01 22:59:23 +03:00
kd-11	f2cac26154	rsx: Refactor out GLSLTypes from GLSLCommon to avoid warning spam due to unused functions when included in settings dialog code	2019-05-31 13:27:43 +03:00
kd-11	507ec8252b	vk: Refactor renderpass management - Ensures the current renderpass matches the image properties even when a cyclic reference is detected - Solves SDK debug output error spam due to mismatching layouts and renderpasses	2019-05-25 14:07:29 +03:00
kd-11	4037225e98	vk: Workaround for cyclic feedback loops - Transition attachments to LAYOUT_GENERAL in case of a feedback loop - Fixes appearance of garbage along polygon edges in some post-processing passes. - Also reverse this transition when rendering goes back to normal	2019-05-17 16:41:17 +03:00
kd-11	cb78522620	rsx: Fixup for uninitialized surface antialiasing mode	2019-05-16 19:25:26 +03:00
kd-11	45a13d0319	rsx: Fixup for lost aliased surfaces - Intersection routines were changed and require explicit identification of the "old surface"	2019-05-16 19:25:26 +03:00
kd-11	05eb1e9193	rsx: Fix zombie image references from inside the texture cache - Do not add locked orphans to the flush_always cache! They will not remove their cache entries as they are not bound	2019-05-16 19:25:26 +03:00
kd-11	214bb3ec87	rsx: Always initialize memory unless it is guaranteed to be wiped	2019-05-16 19:25:26 +03:00
kd-11	88290d9fab	rsx: Hack around using data regions as transfer targets	2019-05-16 19:25:26 +03:00
kd-11	4182f9984d	rsx: Propagate split section information back to the texture cache	2019-05-16 19:25:26 +03:00
kd-11	3c7d8a1099	rsx: Minor texture/surface scanning optimization - Also re-enable optimization in blit engine accidentally disabled during debugging	2019-05-16 19:25:26 +03:00
kd-11	9f0090772a	rsx: Fix write tagging when comments are transferred in by blit engine	2019-05-16 19:25:26 +03:00
kd-11	4b443be881	rsx: Fix self-intersection with previous occupant of the address being replaced	2019-05-16 19:25:26 +03:00
kd-11	b840f6da28	[WIP] rsx: Use a sane reference counting model	2019-05-16 19:25:26 +03:00
kd-11	e3cf3ab6b8	rsx: Minor fixes - Fix transfer scaling (inverted) - Fix under-estimated typeless acquisition when doing depth format scaling	2019-05-16 19:25:26 +03:00
kd-11	e02e27b2b3	rsx: Prevent out-of-bounds writes when resolving shader input textures - The target area can also have padding!	2019-05-16 19:25:26 +03:00
kd-11	88c20afd3a	rsx: Implement unaligned surface inheritance with hierachial contribution - Allows render targets to behave like stacked 3D views same as shader inputs are resolved - Basically implements most of 'Read Color/Depth Buffers" option for 'free'. - Allows splitting RTV/DSV resources if they are superceded by a partial surface - Also allows intersecting new resources through the surface cache for proper inheritance from other scattered data - TODO: Refactor bind_surface_as_rtt and bind_surface_as_ds to reduce asinine code duplication	2019-05-16 19:25:26 +03:00
scribam	6c5ea068c9	Remove redundant semicolons Fix "-Wextra-semi" warnings	2019-05-12 18:32:11 +03:00
kd-11	6b7cd458e3	rsx: Silence some diagnostics unless compiled with debugging options	2019-05-01 15:36:21 +03:00
kd-11	48cb265c2c	rsx: Bounds check on local resource for atlas merge. - Local resources can also have padded pitch dimensions and false-positives on range overlap tests	2019-05-01 15:36:21 +03:00
kd-11	ec9aa74008	rsx: Fix section base offset calculation for blit_dst targets which affects confirmed memory range - Fixes flushes only writing partially to target memory	2019-05-01 15:36:21 +03:00
kd-11	243df38360	rsx: Fix VP writes to CC with a MOV instruction - When moving to CC, the operation has VEC flag disabled and also temp regs disabled. Looks to be the catch-all ELSE in the selection logic.	2019-04-25 16:23:05 +03:00
kd-11	3cbccdd760	rsx: Fragment shader decompiler cleanup TODO: Investigate the _s input modifier behaviour further, in case it can avoid generating zeroes from a MAD instruction. x = MAD(+ve, -ve, -ve) with _s input modifier in BFBC expects result to be Non-zero	2019-04-25 16:23:05 +03:00
kd-11	4cd1c25729	"rsx: Ignore argument sign for SQRT operations"	2019-04-25 16:23:05 +03:00
kd-11	32396ba366	rsx: Simplify use of some mixed input functions using OPFLAGS to avoid implicit conversions	2019-04-25 16:23:05 +03:00
kd-11	f12bd8068c	rsx: Fragment decompiler fixups - Properly test for NaN and Inf when clamping down to fp16 - Optimize divsq a bit; mix(vec, vec, bvec) emits OpSelect which is what we want here, instead of component-wise selection which is much slower.	2019-04-25 16:23:05 +03:00
kd-11	abe7188acf	rsx: Proper workaround for broken DIVSQ instruction on realhw - While mul(0, nan) = nan and 0 / 0 = nan, 0 / sqrt(0) = 0 because of hw gremlins. normalize(0) is also nan so this behaviour does not work around that particular case either which makes it even more baffling.	2019-04-25 16:23:05 +03:00
kd-11	60f3059d22	rsx: Compensate for nvidia's low precision attribute interpolation - The hw generates inaccurate values when doing perspective-correct interpolation of vertex output attributes and makes the comparison (a == b) fail even when they are a fixed constant value. - Increase equality tolerance when doing comparisons in fragment shaders for NV cards only to work around this issue. - Teepo fix	2019-04-25 16:23:05 +03:00
kd-11	463b1b220d	rsx: Improve accuracy of shadow compare Ops when non-integer depth formats are used - The fixed-point D24S8 format does special Z clamping during compare which matches PS3 behaviour - D32S8 is a floating point format and comparison with Dref > 1 always fails causing black edges/borders	2019-04-25 16:23:05 +03:00
kd-11	06a85f00d1	rsx: Shader decompiler cleanup and improvements - Improve support for float16_t by minimizing mixed inputs to functions (ambiguous overloads) - Minimize amount of downcasts in code by using opcode flags - Re-enable float16_t support for vulkan	2019-04-25 16:23:05 +03:00
kd-11	a668560c68	rsx: Use native half float types if available - Emulating f16 with f32 is not ideal and requires a lot of value clamping - Using native data type can significantly improve performance and accuracy - With openGL, check for the compatible extensions NV_gpu_shader5 and AMD_gpu_shader_half_float - With Vulkan, enable this functionality in the deviceFeatures if applicable. (VK_KHR_shader_float16_int8 extension) - Temporarily disable hw fp16 for vulkan	2019-04-25 16:23:05 +03:00
kd-11	ee319f7c13	rsx: Implement strict clamp16 operation needed for NVIDIA cards	2019-04-25 16:23:05 +03:00
kd-11	df3b46a611	rsx: Improve texture sourcing and clipping when reverse scanning is enabled - When reverse scanning, offsets are inverted and offset value of 0 is logically equivalent to an offset of -1 - Add an explicit message if clipping happens to avoid silent errors/bugs	2019-04-12 15:36:21 +03:00
kd-11	12dc3c1872	vk: Dynamic heap management to potentially fix ring buffer overflows - Allows checking one heap type at a time, on demand - Should avoid OOM situations unless inside an uninterruptible block	2019-04-09 13:40:54 +03:00
kd-11	a4495c35b7	rsx: Fixups for swizzled texture scanning - Revert to using block metrics, but with optional per-channel decode stage for the final transfer. Much cleaner than hacking in the width to be in channels instead of blocks.	2019-04-09 13:40:54 +03:00
kd-11	0a604e39f1	rsx: Implement RGB655 decode	2019-04-09 13:40:54 +03:00
kd-11	e4e86455f2	rsx: Fix temporary subresource caching behaviour - Do not cache if a gathered subresource contains a bound RTT - Change op to dynamic copy if parent is still bound	2019-04-09 13:40:54 +03:00
kd-11	3249000511	rsx: Improvements to texture scanning - Removes CPU-only transforms that broke GPU-side code. -- Channels in GPU compute are laid out in cell-order, but CPU was uploading in favorable order and compensating with swizzles. -- This leads to 2 different layouts depending on the location of the data (CPU vs GPU) - Implement R8G8_R8B8 interleaved format decode - General improvements	2019-04-09 13:40:54 +03:00
kd-11	366e4c2422	rsx: Preliminary support for format conversions using typeless resolve	2019-04-09 13:40:54 +03:00
kd-11	b7470cfc1a	rsx: Tighten format checks in cache hit tests	2019-04-09 13:40:54 +03:00
kd-11	443fde760f	rsx: Blit engine clipping fixes - Do not round up sub-pixel offsets, round down instead - Do not allow incomplete sources for hw blit transfer - Reimplement src clipping (slice_h) - Check 'area' of incoming texels and correct for them before RTT lookup/transfer - Filter out incomplete targets when performing RTT lookup (1 texel or less contribution)	2019-04-09 13:40:54 +03:00
kd-11	41b87cf577	rsx: Blit engine fixes - If a transfer writes to a RTT and depth mismatch happens, create a local target and the upload function will likely resolve between the two - If a surface is rejected, reset the target region!	2019-03-22 21:27:15 +03:00
kd-11	86ad204636	rsx: Rebase output region when using upload-fallback path	2019-03-22 21:27:15 +03:00
kd-11	dbc8e70ddd	rsx: Silence some compiler noise	2019-03-22 21:27:15 +03:00
kd-11	adc59f9810	rsx: Fix blit transfers when texel sizes mismatch - Also refactors some bpp handling code - Simplify texture intersection test to use a normalized/uniform coordinate space - Fix broken bounds checking as well	2019-03-22 21:27:15 +03:00
kd-11	03fca73cf4	rsx: Fix blit intersection falling outside the available texture - Just becaue we have a hit inside the tile of interest does not guarantee that it sits inside the texture!	2019-03-20 10:05:54 +03:00
kd-11	3ef16bee47	rsx: Fix texture lookups and avoid out-of-bounds copies/transfers	2019-03-17 21:50:11 +03:00
kd-11	bb65e45614	rsx: Implement GPU acceleration for rotated images	2019-03-17 21:50:11 +03:00
kd-11	5260f4b47d	rsx: Improvements to memory flush mechanism - Batch dma transfers whenever possible and do them in one go - vk: Always ensure that queued dma transfers are visible to the GPU before they are needed by the host Requires a little refactoring to allow proper communication of the commandbuffer state - vk: Code cleanup, the simplified mechanism makes it so that its not necessary to pass tons of args to methods - vk: Fixup - do not forcefully do dma transfers on sections in an invalidation zone! They may have been speculated correctly already	2019-03-17 21:50:11 +03:00
kd-11	385485204b	vk/gl: Omit unlocked data when grabbing flip sources from texture cache	2019-03-17 21:50:11 +03:00
kd-11	74eeacd091	vk/gl: Improve memory tag sync and test - Properly pass parameters such as rsx-pitch to the surface store - Do not crash if a surface fails verification in flip, use fall-back instead	2019-03-17 21:50:11 +03:00
kd-11	1a44446250	rsx: Fix dst upload block region - The section needed starts at image origin, not transfer origin!	2019-03-17 21:50:11 +03:00
kd-11	a49a0f2a86	vk/gl: Synchronization improvements - Properly wait for the buffer transfer operation to finish before map/readback! - Change vkFence to vkEvent which works more like a GL fence which is what is needed. - Implement supporting methods and functions - Do not destroy fence by immediately waiting after copying to dma buffer	2019-03-17 21:50:11 +03:00
kd-11	85cb703633	rsx/cache: Debugging bugs introduced by the atlas coverage check - Figured out why it breaks things, ofc can't actually check for coverage when there is no proper fbo data persistence	2019-03-17 21:50:11 +03:00
kd-11	3a4083263e	rsx: Fix texture transfer when pitch does not match exactly	2019-03-17 21:50:11 +03:00
kd-11	612160a8ff	rsx: Fix zero-pitch textures - Assumption here is that only texel (0, 0) is accessible. Inline with other pitch 0 operations. - TODO: Verify pitch 0 does not advance in Y either	2019-03-17 21:50:11 +03:00
kd-11	17c49d21a5	rsx/blit: Remove workarounds/hacks added for master. Start implementation/stubs for blit engine rotations in GPU	2019-03-17 21:50:11 +03:00
kd-11	745f8f9627	rsx: Remove pointless assert	2019-03-17 21:50:11 +03:00
kd-11	358558aaa7	cleanup and fixups	2019-03-10 16:09:05 +03:00
kd-11	04dda44225	rsx: Properly generate render target data with all parameters provided - Build-up to variable-sized framebuffers and AA implementation - Also allows accurate range calculation for our hit testing	2019-03-10 16:09:05 +03:00
kd-11	21bc6c7a87	rsx: Properly resolve data for upload when needed. - Avoids blindly reusing blit dst sections as they may contain garbage. If a section was unlocked for a flush, just discard it as its reuse introduces potential data corruption. Since the data needs to be reuploaded anyway (for now), its better to start afresh - In case of format mismatch, reset the calculated dst block - Add a bounds check to determine if data contained in an atlas is good enough for sampling the cache. If not enough data is provided, fall back to full upload	2019-03-10 16:09:05 +03:00
kd-11	9d4d3d9443	rsx: Reimplement render target intersection tests when using hw accelerated blit engine - Properly collapse memory tree when scanning in case of overlaps!	2019-03-10 16:09:05 +03:00
kd-11	7c379432dd	rsx: Implement proper pitch compatibility lookup - When a single row is required or is all that is available, pitch has no meaning as the coordinate space changed to 1D	2019-03-10 16:09:05 +03:00
kd-11	dccb4a4888	rsx/texture_cache: fixes to commit_framebuffer_memory	2019-03-10 16:09:05 +03:00
kd-11	b9e7b085fe	rsx/texture_cache: Fixups for local resource hit and fast-path added	2019-03-10 16:09:05 +03:00
kd-11	10dc3dadee	rsx/texture_cache: Improve framebuffer memory locking when WCB/WDB is not enabled - Adds a new mode that removes non-framebuffer stuff inside framebuffer range	2019-03-10 16:09:05 +03:00
kd-11	563e205a72	rsx/texture_cache: Fix 'AA' scaling hack and restore collection template selection	2019-03-10 16:09:05 +03:00
kd-11	fa628f0ac4	rsx/surface_store: More aggressive tag sampling - Use a 5-point tap with an X pattern across the target's memory space to reduce chances of false positives - TODO: Potential false positives identified, requires some minor restructuring of surface_store	2019-03-10 16:09:05 +03:00
kd-11	3a071a9c07	rsx: Texture search rewrite - Perform a full search across all resource types as needed without taking too many shortcuts/hacks	2019-03-10 16:09:05 +03:00
kd-11	6ef9dcd62e	rsx: Handle mismatched/invalidated framebuffer sections when WCB is enabled	2019-03-10 16:09:05 +03:00
kd-11	ef071ebb6b	rsx: Synchronize surface cache and texture cache data - TODO: The whole upload_texture thing is a big hack, fix it properly	2019-03-10 16:09:05 +03:00
kd-11	2163a59649	rsx: Typo fix	2019-01-25 14:34:22 +03:00
kd-11	fb778e4821	rsx: Reimplement attrib divisor	2019-01-25 14:34:22 +03:00
kd-11	736415fcd9	rsx/fp: Detect broken/NOP shaders automatically - Do not compile body if the shader is of no consequence, leave as a passthrough shader	2019-01-25 14:34:22 +03:00
kd-11	6fdc0fd7f0	rsx: Reimplement MSAA transparency - Apply dither to edges that almost fail the straight-up alpha test - Significantly improves alpha tested geometry far from the camera - Also removes blend factor overrides/hacks as they give incorrect results due to background bleeding	2019-01-25 14:34:22 +03:00
kd-11	417a2e6731	rsx: Refactor index buffers - Index offset is ignored anyway and only used to calculate vertex attribute divisor index - Specialized optimization for untouched xfer without primitive restart	2019-01-25 14:34:22 +03:00
Nekotekina	bd9131ae1c	Implement fs::get_cache_dir Win32: equal to config dir for now Linux: respect XDG_CACHE_HOME if specified OSX: possibly incomplete	2019-01-13 14:45:36 +03:00
kd-11	52ac0a901a	rsx: improve memory coherency - Avoid tagging and rely on read/write barriers and the dirty flag mechanism. Testing is done with a weak 8-byte memory test - Introducing new data when tagging breaks applications with race conditions where tags can overwrite flushed data	2019-01-06 10:44:40 +03:00
kd-11	89c9c54743	rsx: Minor hot-fix - Pitch 0 makes sense if width == 1 and height == 1	2019-01-06 10:44:40 +03:00
kd-11	2a62fa892b	rsx: Texture cache refactor - gl: Include an execution state wrapper to ensure state changes are consistent. Also removes a lot of required 'cleanup' for helper methods - texture_cache: Make execition context a mandatory field as it is required for all operations. Also removes a lot of situations where duplicate argument is added in for both fixed and vararg fields - Explicit read/write barrier for framebuffer resources depending on usage. Allows for operations like optional memory initialization before reading	2019-01-06 10:44:40 +03:00
kd-11	3be4b474d9	rsx: Handle rsx-self-tripping in draw call and triggering invalid invalidation - If draw call resources consume memory that intersects with NA parts of the texture cache, we get a framebuffer test mismatch. This mismatch is false and happens because the thread has not yet reached the point of relocking the pages	2019-01-06 10:44:40 +03:00
kd-11	a95a44cf66	rsx: Strictness cleanups - Also account for variable pitch textures (swizzled scan)	2019-01-06 10:44:40 +03:00
kd-11	362eea09a1	whitespace fix only	2019-01-06 10:44:40 +03:00
kd-11	15d5507154	rsx: Rewrite memory inheritance transfers - Implicitly invoke a memory barrier if actively reading from an unsynchronized texture - Simplify memory transfer operations - Should allow more games to work without strict mode	2019-01-06 10:44:40 +03:00
kd-11	97704d1396	rsx: Fix texture size calculations	2019-01-06 10:44:40 +03:00
kd-11	50c07833e4	rsx: Do not force upload for missing data - TODO: Finish implementing GPU RCB for mem-sync - TODO: Refactor mem-sync	2019-01-06 10:44:40 +03:00
kd-11	15488eb247	rsx: Avoid unnecessarily touching framebuffer memory - Do not bind companion framebuffer when clearing single aspect; let the contest mechanism sort it out instead - Do not prematurely tag framebuffers, instead only do so at write-confirmation time. Should avoid false tagging if setup does not allow a render to occur.	2019-01-06 10:44:40 +03:00
Megamouse	bb464b0b64	fix some warnings	2019-01-05 04:03:18 +01:00
kd-11	7555be232f	rsx/vp: Fix double dst commands - Test the vec_result mask before assigning to actual output Sometimes, VEC op is used to write to Rx, and SCA op is used to write to o[x]!	2018-12-24 09:05:19 +03:00
kd-11	4b79ef1ad9	rsx: Implement stencil mirror views - Implements a mirror view of D24S8 data that accesses the stencil components. Finishes the implementation of TEX2D_DEPTH_RGBA as the stencil component was previously missing from the reconstructed data - Add a few missing destructors Image classes are inherited a lot and I forgot to make the dtors virtual	2018-12-24 09:05:19 +03:00
kd-11	696b91cb9b	rsx: Reimplement conditional execution in shaders - Per-channel conditional execution introduces RAW hazards all over the place - Its cheaper to process both branches and select between the two - Also improves ShaderVariable functionality to allow functionality such as match_size and taking complex variables as inputs	2018-12-24 09:05:19 +03:00
Rui Pinheiro	54bfe2e102	Add log warning on slow flush path	2018-12-11 22:37:10 +03:00
Rui Pinheiro	18b9ee4541	Reimplement overlapping fbo "hack" To avoid the need (and performance hit) of Read Color/Depth Buffers, we may not invalidate overlapping fbos inside lock_memory_region unless they are guaranteed to be superseded by the new one. This avoids e.g. issues with overblooming, among others.	2018-12-11 22:37:10 +03:00
Rui Pinheiro	5ab7296665	Fix xcode build	2018-12-11 22:37:10 +03:00
Rui Pinheiro	bcdf91edbb	Misc. Texture Cache fixes	2018-12-11 22:37:10 +03:00
Rui Pinheiro	9d1cdccb1a	Implement dedicated texture cache predictor	2018-12-11 22:37:10 +03:00
Rui Pinheiro	af360b78f2	Texture cache section management fixups Fixes VRAM leaks and incorrect destruction of resources, which could lead to drivers crashes. Additionally, lock_memory_region is now able to flush superseded sections. However, due to the potential performance impact of this for little gain, a new debug setting ("Strict Flushing") has been added to config.yaml	2018-12-11 22:37:10 +03:00
eladash	4ddafc481e	remove unreachable code	2018-12-04 13:01:29 +03:00
kd-11	504ab5a6d4	rsx: Minor cleanup to silence stupid compiler warnings	2018-12-03 20:01:23 +03:00
kd-11	7b065d7781	rsx: Fixup; input attributes blob decoding - Use an unstructured blob and index into the vec4 structures to extract the real data	2018-11-30 23:51:25 +03:00
kd-11	846daadd5d	rsx: Fixups - Improve vertex attribute layout format. Allows for full 16-bit attribute divisor - Use actual pitch when declaring framebuffer rsx pitch instead of register value in case of swizzle? rendering	2018-11-30 23:51:25 +03:00
kd-11	1ad76ad331	rsx: Restructure programs - Also re-enable pipeline optimizations	2018-11-30 23:51:25 +03:00
kd-11	677b16f5c6	rsx: Fixups - Also fix visual corruption when using disjoint indexed draws - Refactor draw call emit again (vk) - Improve execution barrier resolve - Allow vertex/index rebase inside begin/end pair - Add ALPHA_TEST to list of excluded methods [TODO: defer raster state] - gl bringup - Simplify - using the simple_array gets back a few more fps :)	2018-11-30 23:51:25 +03:00
kd-11	e01d2f08c9	rsx: Refactor FIFO - Removes fifo structures from common RSXThread - Sets up a dedicated FIFO controller - Allows for configurable queue optimizations	2018-11-30 23:51:25 +03:00
eladash	83b6c98563	rsx: Fix u16 index arrays overflow Force u32 index array destinations to avoid overflows when adding vertex base index.	2018-10-08 16:39:47 +03:00
eladash	e361e0daa6	rsx: Fix restart index check for u16 index arrays Dont ignore upper bits of the restart index with u16 types	2018-10-08 16:39:47 +03:00
eladash	348db050ae	rsx: Fix texture height read	2018-10-03 20:57:46 +03:00
eladash	fa723f6dc4	rsx: Fix texture depth read	2018-10-03 20:57:46 +03:00
eladash	6586090307	rsx: Remove texture size hack	2018-10-03 20:57:46 +03:00
eladash	eacd1b8f13	rsx: Remove texture address hack	2018-10-03 20:57:46 +03:00
Nekotekina	da6ce80f4f	Make vm::get_super_ptr return contiguous memory Cleanup RSX code complexity	2018-09-27 23:37:13 +03:00
kd-11	dab30c0051	rsx: Disable predictions if 50% of predictions are wrong - This happens often in loading screens where the memory usage pattern is often randomized by loading in of assets	2018-09-24 21:19:38 +03:00
Rui Pinheiro	35139ebf5d	Texture cache cleanup, refactoring and fixes	2018-09-24 15:26:40 +03:00
kd-11	dafc914bcc	rsx: temporary hack - Removes all use of valid_count as a metric until the new refactor is merged	2018-09-21 16:32:23 +03:00
kd-11	fc486a1bac	rsx: Preserve memory order when doing flush - Orders flushing to preserve memory at all cost - Avoids false positive where flushing overlapping sections can falsely invalidate another with head/tail test	2018-09-21 16:32:23 +03:00
kd-11	a21bdb9f45	rsx; blit engine fixes - Forcefully downloads and reuploads data from the CPU in case of unexpected overlaps - Properly detect correct size of newly created blit targets - Remember to clear any existing views when changing the default component map!	2018-09-21 16:32:23 +03:00
kd-11	16dcbe8c74	rsx/vp: Fix ARL opcode properly - NOTE: The address swizzle index is only for use as src. The address registers are only used one channel at a time. - When the destination of ARL, the encoding is the same as the other temp registers	2018-09-15 11:57:06 +03:00
kd-11	f413996362	rsx: Minor texture cache fixes - Retag resources reprotected under flush_always rules - Properly check for blit resource fitting taking into account format mismatch, pitch mismatch and typeless transfers	2018-09-10 15:43:28 +03:00
Dzmitry Malyshau	27474316fd	Add missing virtual desctructors (#5094 )	2018-09-07 14:35:40 +03:00
kd-11	66610a28af	rsx/common: Clean up shared glsl header to minimize string concat operations	2018-09-06 21:11:11 +03:00
kd-11	346b97f871	rsx: Preserve fog coordinate across shader stages - The x value contains the VP output value interpolated across primitive surface - The y coordinate contains the fog fraction according to the selected fog formula	2018-09-06 21:11:11 +03:00
scribam	d7bb59cd99	c++17: use std::size	2018-09-06 13:15:59 +03:00
Nekotekina	ca5158a03e	Cleanup semaphore<> (sema.h) and mutex.h (shared_mutex) Remove semaphore_lock and writer_lock classes, replace with std::lock_guard Change semaphore<> interface to Lockable (+ exotic try_unlock method)	2018-09-03 23:00:36 +03:00
Nekotekina	ce4c4696dd	Try to get rid of SIZE_32 macro	2018-09-03 21:40:36 +03:00
kd-11	dea5193fd7	rsx: Fix FP temp register count	2018-09-03 21:39:18 +03:00
kd-11	6399833182	rsx: Fix endianness order when immediate mode register is updated, but used as register lookup - Simplify the code by unifying all the register-backed memory	2018-09-03 18:24:20 +03:00
Nekotekina	a93a40e8d9	Disable texture_cache::emit_once (MSVC crash)	2018-08-25 01:58:28 +03:00
Nekotekina	1c6c24f8ac	Update GSL and yaml-cpp submodules	2018-08-25 01:15:47 +03:00
Nekotekina	923314aef5	Rewrite texture_cache::emit_once Also trying to workaround MSVC bug	2018-08-25 01:15:47 +03:00
kd-11	c6e35706a3	vk: Support sw component swizzle decode because metal sucks	2018-08-23 22:54:56 +03:00
kd-11	f3d3a1a4a5	rsx: Rework section reuse logic	2018-08-22 17:22:54 +03:00
kd-11	937f1e8cd0	fix gcc build	2018-08-18 16:14:30 +03:00
kd-11	4b2b662c3a	rsx: Followup to the memory inheritance hierachy patch - Tags framebuffer resources on first use (when on_write is called to verify memory) - Texture cache now selects the best match and even sorts atlas writes with memory write order to avoid older data showing over newer one	2018-08-18 16:14:30 +03:00
kd-11	cca488d0cf	rsx: Enable swizzled decode for all formats unless proven otherwise - Some formats are proven to ignore swizzle flag - DXT compressed textures - COMPRESSED_BG_GB class textures - Some applications are using swizzled wide integer formats so those are confirmed to swizzle	2018-08-18 16:14:30 +03:00
kd-11	f8a9b1fa30	[WIP] rsx: Improve memory inheritance hierachy - Cascade memory writes by invalidating 'downstream' subsurfaces - Fixup; always resolve for overlapping surfaces before sampling (force atlas gather test)	2018-08-18 16:14:30 +03:00
kd-11	cc7848b3ef	vulkan: Fix blit engine transfer to ARGB8 render target memory	2018-08-18 16:14:30 +03:00
kd-11	0267221586	Minor optimizations and fixes - FIFO: avoid multiline spam - VK: Fix program setup counter - FS: Precalculate fragment constants buffer size during analysis step	2018-08-18 16:14:30 +03:00
Rui Pinheiro	23b52e1b1c	Mark unsync textures dirty when deferred flushing invalidate_range_impl_base does not mark all textures that will only be unprotected as dirty when doing a deferred flush, since that is done by flush_all. However, if there are no sections to flush, the deferred flush will use the same code path as non-deferred flushes for unprotecting textures and forget to mark them as dirty. This commit fixes this bug.	2018-08-16 15:38:36 +03:00
Rui Pinheiro	fa6a5761b3	Refactor get_intersecting_set The existing implementation restarts the loop immediately after finding a range_data instance that updates the trampled_range. This commit refactors this method to continue the loop with the updated trampled_range, and then repeat only those range_data instances that were iterated through before the trampled_range was last updated. As a result, the number of total iterations required is reduced.	2018-08-16 15:38:36 +03:00

... 4 5 6 7 8 ...

924 commits