- GL queries share the target binding (not asynchronous!)
- Discard active queries by closing them, leave closed queries alone (nothing to be done for discard op)
- Reimplements render target views used for sampling
- Optimizes access using an encoded control token
- Adds proper encoding for 24-bit textures (DRGB8 -> ORGB/OBGR)
- Adds proper encoding for ABGR textures (ABGR8 -> ARGB8)
- Silence some compiler warnings as well
- TODO: Real texture views for OGL current method is a hack
gl/vk: Bump shader cache version
gl/vk: Disable anisotropic override when strict mode enabled as it is proven to alter some games negatively
gl: Clamp buffer view range to not exceed the backing buffer size. Also add assert for the same condition
- Track asynchronous operations in RSX core
- Add read barriers to force pending writes to finish.
Fixes zcull delay flicker in all UE3 titles without forcing hard stall
- Increase zcull latency as all writes should be synchronized now
- ZCULL unit emulation rewritten
- ZCULL reports are now deferred avoiding pipeline stalls
- Minor optimizations; replaced std::mutex with shared_mutex where contention is rare
- Silence unnecessary error message
- Small improvement to out of memory handling for vulkan and slightly bump vertex buffer heap
- Forces Bitcast of texture data if input format cannot possibly be the
same as the existing texture format
- rsx: Other minor improvements to texture cache :-
- remove obsolete blit engine incompatibility warning. The texture will be re-uploaded if it is indeed incompatible
- Implement warn_once and err_once to avoid spamming the log with systemic errors
- Track mispredicted flushes
- Reswizzle bitcasted texture data to native layout
TODO: Also needs reshuffle according to input remap vector
- gl: Do not call makeCurrent every flip - it is already called in set_current()
- gl: Improve ring buffer behaviour; use sliding window to view buffers larger than maximum viewable hardware range
NV hardware can only view 128M at a time
- gl/vk: Bump transform constant heap size When lots of draw calls are issued, the heap is exhaused very fast (8k per draw)
- gl: Remove CLIENT_STORAGE_BIT from ring buffers. Performance is marginally better without this flag (at least on windows)
- Identify depth textures reaching the gpu via shader_read upload path
- Use correct timestamp counter for opengl
- inline draw_state::test_property because msvc doesnt do it for us
- Implement flush-always behaviour to partially fix readback from a currently bound fbo
- Without this, only the first read is correct, as more draws are added the results become 'wrong'
- Fixes WCB and cpublit behviour
- Synchronize blit_dst surfaces to avoid data loss when gpu texture scaling is used
- Its still faster in such cases to disable gpu texture scaling but some types cannot be disabled without force cpu blit (e.g framebuffer transfers)
- Memory management tuning
- rsx: on-demand texture cache rescanning for unprotected sections
- rsx: Only framebuffer resources are upscaled
- Do not resize regular blit engine resources
- Lazy initialize readback buffer when using opengl
-- These measures should help minimize vram usage
- Do not assume texture2D when creating new textures
- Flag invalid texture cache if readonly texture is trampled by fbo memory.
Avoids binding a stale handle to the pipeline and is rare enough that it should not hurt performance
- vulkan: Do not assume an aux frame context must exist in a well defined state as set in init_buffers() since the request might be external (via overlays path)
- gl: Do not bother waiting for idle before servicing external flip requests
- gl: Queue overlay cleanup requests to ensure only glthread attempts touching the context
- overlays: Do not compute size metrics for invalid/unsupported glyphs
- opengl driver optimization for nvidia. On nvidia glTextureBufferRange performance is horrendous
-- Initialize texture buffer to whole buffer at startup and use absolute offsets to read data instead
-- Over 2x performance in some cases (Resogun, TNT racers)
- gl/vk: Do not flip non-existent display buffers. Fixes spec violation at boot in TNT racers demo
- whitespace fixes for sys_rsx
- The scale offset matrix is fine but on real hardware the z results seem to be independent of near/far clipping distances
-- If depth falls within near/far, clamp depth value to [0,1]
- Force mipmap count to 1 if sampling from an RTV/DSV
- TODO: Better wcb flush detection, it should be better to re-upload the texture after it has been dwnloaded if expected mipmaps are > 1
- Handle aliased depth + color target by disabling depth writes. This looks to be the correct way
- Add support for generic passes that cannot be done using general imaging operations. Lays the framework for tons of features and effects
- Implement RGBA->D24D8 casting. Sometimes games will split depth texture into RGBA8 then use the new RGBA8 as a depth texture directly
-- This happens alot in ps3 games and I'm not sure why. Its likely the ps3 did not sample fp values with linear filtering so this is a workaround
-- Only implemented for openGL at the moment
-- Requires a workaround for an AMD driver bug
- In rare cases the driver derefs a nullptr and dies, taking the emulator with it
- From testing, it seems the vram is indeed freed when this happens so its "safe" to continue
- Remove generic throws from the rsx pipeline. Stops the rsx thread from silently dying leaving the emulator in a hung state
- Hackplement add_signed and reverse_subtract_signed blend modes
- Reimplement fragment program fetch and rewrite texture upload mechanism
-- All of these steps should only be done at most once per draw call
-- Eliminates continously checking the surface store for overlapping addresses as well
addenda - critical fixes
- gl: Bind TIU before starting texture operations as they will affect the currently bound texture
- vk: Reuse sampler objects if possible
- rsx: Support for depth resampling for depth textures obtained via blit engine
vk/rsx: Minor fixes
- Fix accidental imageview dereference when using WCB if texture memory occupies FB memory
- Invalidate dirty framebuffers (strict mode only)
- Normalize line endings because VS is dumb
to flush_all is up to date. Should prevent recursive exceptions
Partially revert Jarves' fix to invalidate cache on tile unbind. This will
need alot more work. Fixes hangs
- Identify minimize/restore events as separate from regular resize and do not react to them
- Enable message queue consumption after loading the shaders cache. Also hides the frame in this step
-- This fixes the 'start fullscreen' bug when running vulkan
- This allows for better handling of deferred flushes.
-- There's still no guarantee that cache contents will have changed between the set acquisition and following flush operation
-- Hopefully this is rare enough that it doesnt cause serious issues for now
- Refactor invalidate memory functions into one function
- Add cached object rebuilding functionality to avoid throwing away useful memory on an invalidate
- Added debug monitoring of texture unit VRAM usage
- Emulate primitive restart in software whenever we get the chance
- Ensure PRIMITIVE_RESTART is never active when LIST topologies are active
- Reimplement TRIANGLE_FAN, POLYGON and QUAD expansion
- This communication is important in communicating window events. Helps properly synchronize swapchain management on vulkan and stops nvidia crashing
- Do not block the message queue lest the driver detect window as not responding
rsx: Conditional lock hack removed
vulkan - Fixes
- Remove unused texture class
- Fix native pitch calculation (WCB)
rsx: Catch hanging begin/end pairs when flushing deferred draw calls
vulkan: Register DXT compressed formats
vulkan: Register depth formats
gl: Workaround for 'texture stitching' when gathering flip surface
- TODO: Add a proper flip hack option
rsx: Fix texture memory size calculation
- DXT textures dont have real pitch. Since pitch is used to calculate memory size, make sure it always evaluates to rsx_size
rsx: Fix cpu copy detection
rsx: Validate blit dst surface and dont make assumptions about region blit order
- Also relax restrictions on memory owned by the blit engine if strict rendering is not enabled
rsx: Fix depth texture detection
rsx: Do not manually offset into dst. The overlapped range check does so automatically
rsx: Minor optimizations
rsx: Minor fixes
- Fix to detect incompatible formats when using GPU texture scaling and show message
- Better 'is_depth_texture' algorithm to eliminate false positives
gl/vk/rsx: Refactoring; unify texture cache code
gl: Fixups
- Removes rsx::gl::texture class and leave gl::texture intact
- Simplify texture create and upload mechanisms
- Re-enable texture uploads with the new texture cache mechanism
rsx: texture cache - check if bit region fits into dst texture before attempting to copy
gl/vk: Cleanup
- Set initial texture layout to DST_OPTIMAL since it has no data in it anyway at the start
- Move structs outside of classes to avoid clutter
gl: Fix multidraw [WIP]
rsx: Ignore vertex base when data source is generated using arithmetic
vk: Check pending flag before doing fence poke
vk/gl: Fix for inlined array and immediate draws
rsx: Collapse joined draws when batching
rsx: Blit engine improvements
- Always handle blits to and from framebuffers through the GPU
- Handle depth surfaces properly when using GL
- Check for format mismatches when blitting to the surface store [WIP]
Fix rsx offscreen-render-to-display-buffer-blit surface reads
- Also, properly scale display output height if reading from compressed tile
gl: Fix broken dst height computation
- The extra padding is only there to force power-of-2 sizes and isnt used
gl: Ignore compression scaling if output is rendered to in a renderpass
rsx/gl/vk: Cleanup for GPU texture scaling. Initial impl [WIP]
- TODO: Refactor more shared code into RSX/common
rsx/vk/shaders_cache: Move vp control mask to dynamic state
rsx/vk/gl: adds a shader cache for GL. Also Separates pipeline storage for each backend
rsx: Add more texture state variables to the cache
- Significant gains from greatly reduced CPU work
- Also reorders command submission in end() to improve throughput
- Refactors most of the vertex buffer handling
- All vertex processing is moved GPU side
- Improvements to framebuffer usage; Avoid creating new resources every frame
- Handle null fragment program properly
- Collect vertex upload statistics
- vk: Pre-initialize 'unused' varying registers in the vertex shader in case it gets matched with a fs that consumes it
-- Fixes a crash about fog_c not being declared
gl/dx12/vk: Handle null fragment program
- cleanup - use yield semantic instead of sleep(0) as yield is more cross-platform
-- sleep(0) is a windows specific scheduler hint
- vk: Do not select first available format when choosing a swapchain format
- gl/vk: Ignore rendering zero sized framebuffers/scissors
- fp: Re-enable range clamp on fp16 registers; fix fx12 clamping [-2, 2]
Improve scaling and separate sampler state from texture state
gl: Unify all texture cache objects under one structure separate by use case
gl: Texture cache fixes
- Acquire lock when finding matching textures
- Account for swizzled surfaces when deciding whether to cpu memcpy
- Handle swizzled images on the GPU
* gl: Only bind attrib textures on thread startup
* gl: Persistent mapped buffers
* gl: Fix emulated primitives in an inlined array
* gl: Do not re-update program information every draw call
* gl/vk: s1 type is signed normalized not unsigned normalized
* gl/rsx: Allow disabling of persistent buffers for debugging
gl: Large heap size is more practical
gl: Fix a bug with legacy opengl buffers
* gl/rsx: Allow emulation of unsupported attribute formats
* gl: Fix typos and remove dprints
gl: cleanup debug prints
* ui: Move the GL legacy buffer toggle to the left pane
* vk/gl: Fix cmp type, its range is [-1,1] not [0,1] SNORM_INT
* gl/rsx: Implement platform-agnostic text overlays
gl: Restore performance metrics using new text out helper
gl/rsx: Refactor text generation class
* vk: Enable text overlay
gl/vk: Polish overlay counters implementation
gl: Better resource shutdown for text writer
* gl: Optimization, do not rebind TIUs every frame. Speedup
* gl: Optimizations and improvements to vertex upload code
* gl/vk: Texture format swizzles
vk: Texture format fix
vk: Fix YX format swizzles
* rsx: Decode vertex texture index
`unveil<>` renamed to `fmt_unveil<>`, now packs args to u64 imitating va_args
`bijective...` removed, `cfg::enum_entry` now uses formatting system
`fmt_class_string<>` added, providing type-specific "%s" handler function
Added `fmt::append`, removed `fmt::narrow` (too obscure)
Utilities/cfmt.h: C-style format template function (WIP)
Minor formatting fixes and cleanup
gl: stream uniform data using stream buffer
gl: vertex streaming improvements and bugfixes
gl: add basic timing info check for profiling
gl: ebo streaming fixes and enhancements
* Optimizations
1) Some headers simplified for better compilation time
2) Some templates simplified for smaller executable size
3) Eliminate std::future to fix compilation for mingw64
4) PKG installation can be cancelled now
5) cellGame fixes
6) XAudio2 fix for mingw64
7) PPUInterpreter bug fixed (Clang)
* any_pod<> implemented
Aliases: any16, any32, any64
rsx::make_command fixed