- Allows sections reclaimed by the surface store due to overlap/inheritance to be identified and removed.
- Additionally, potentially lowers the number of flushes required per block with multiple overlaps improving efficiency and theoretically performance.
Allow x30 times the speed of vblank rate + clocks scale of original PS3.
In theory a 60 fps limit game which scales frame limit perfectly with vblank rate can be played at up to 1800 fps with this change.
And:
* Fixed lv2 sleep with Clocks Scaling
* Make these settings dynamicaly adjustable.
* Avoid code duplication
- Reject writes to RTT if the source data is of unknown origin.
non-RTT data and only 1 line in length is suspicious and often GPU data like programs or other rendering inputs.
- Attempt to identify blit operations that will be flushed immediately
after and just do them on CPU instead if the transformation is trivial.
- If only a single blit section is contributing to an atlas merge op, the
threshold should be 100%. The only acceptable result here is a
truncation.
- Raise passing 'score' from 50% to 90% to filter out very incomplete
merge operations.
- Catch unfit sections passing the match test; possible for blit_dst
data but will likely be always harmless. Disabled in release builds by default.
- Some games use texture semaphore for zcull sync which is rather bizzare.
However, it works on realhw as the depth test happens before fragment shader completion
- Due to the high performance penalty incurred by this act, this
behavior is only enabled by the "strict rendering mode" option.
- Adds support for partial (letterboxed) source images by taking insets
into account.
- Bugfix for potential access violation when capturing screenshot on
vulkan
- Adds the same optimization/simplification steps to complex image
transfer routines. Whenever possible, multi-step transfers are collapsed
into a single operation.
- Avoid double transfers where a transfer to a temp image is done
without scaling and then a secondary transfer follows. Combines the two
steps into one whenever possible which can significantly alleviate
bandwidth problems at higher resolutions. Significant speedup, upto 90%
in some cases (PDF, PDF2)
- Interpolating floats is not the same as interpolating their bits!
Use integer format to interpolate linearly for D32F formats instead of using R32F as intermediary
- Queueing commands on the offloader is a good idea but unfortunately
page faults can still happen causing a cyclic dependency and eventual
deadlock. Characterized by a vk::wait_for_event timed out error
accompanied by severe hitching.
- Drain the fault-able commands before pushing a submit operation to the
queue. If a fault is in progress, bypass the queue system and submit
raw. Technically this is incorrect but there isn't much that can be
done about it right now.
- When using partial results on NVIDIA, a non-zero result is returned even when the draw is fully occluded.
This, I believe, violates spec which says the partial result shall be between 0 and the final result.
- Adds color interpolation and modulation pass and refactors the code a
bit. Elements with this pass applied have their color modulated by the
animated color from the pass. Modulation transform is multiplicative.
Instead of speed, direction and distance, the user now specifies
start/end offsets and how much time the transition should take.
Fixes:
- Stuttering caused from framerate estimation.
- An edge case where animations would go over their supposed limit.
Adds:
- The ability to specify arbitrary easing functions for the animations
- Implemented quadratic ease in and ease out and cubic ease in/out.
- Usage of cubic ease in/out in the trophy notification