Commit graph

7103 commits

Author SHA1 Message Date
Wunk 551fe52068
Merge a8b9cd8e65 into 01ae24e46e 2025-08-21 14:17:05 +10:00
guccigang420 01ae24e46e [Base/Memory] Fix VirtualQuery length parameter 2025-08-20 13:34:39 +03:00
Triang3l 0b2ffa3148 [GPU] Change texture load cbuffer to push constants
Simplify the code, eliminating the need for supporting requesting cbuffers
for anything other than guest draw command execution.
2025-08-20 12:46:26 +03:00
Triang3l 04d5c40d0d [GPU/UI] XeSL readability improvements + float suffix
Use the _xe suffix instead of the xesl_ prefix for quicker visual
recognition of identifiers, also switch to snake_case for consistency.

Also add the f suffix to float32 literals because the Metal Shading
Language is based on C++.
2025-08-19 21:36:06 +03:00
Triang3l 3b4b04c371 [Build] Locate FXC among Windows SDK architectures and versions 2025-08-19 20:48:26 +03:00
Triang3l 4234440681 [Vulkan] Fix VulkanInstance::Create return values 2025-08-15 17:19:23 +03:00
Triang3l b5432ab83f [Vulkan] Refactoring and fixes for VulkanProvider and related areas
Enable portability subset physical device enumeration.

Don't use Vulkan 1.1+ logical devices on Vulkan 1.0 instances due to the
VkApplicationInfo::apiVersion specification.

Make sure all extension dependencies are enabled when creating a device.

Prefer exposing feature support over extension support via the device
interface to avoid causing confusion with regard to promoted extensions
(especially those that required some features as extensions, but had those
features made optional when they were promoted).

Allow creating presentation-only devices, not demanding any optional
features beyond the basic Vulkan 1.0, for use cases such as internal tools
or CPU rendering.

Require the independentBlend feature for GPU emulation as working around is
complicated, while support is almost ubiquitous.

Move the graphics system initialization fatal error message to xenia_main
after attempting to initialize all implementations, for automatic fallback
to other implementations in the future.

Log Vulkan driver info.

Improve Vulkan debug message logging, enabled by default.

Refactor code, with simplified logic for enabling extensions and layers.
2025-08-14 23:44:21 +03:00
Triang3l a06be03f1b [GPU] Cleanup definitions of some registers
VS/PS_NUM_REG is 6-bit on Adreno 200, and games aren't seen using the
bit 7 to indicate that no GPRs are used. It's not clear why Freedreno
configures it this way.

Some texture fetch fields were deprecated or moved during the development
of the Xenos, reflect that in the comments.

Add definitions of the registers configuring the conversion of vertex
positions to fixed-point. Although there isn't much that can be done with
it when emulating using PC GPU APIs, there are some places in Xenia that
wrongly (though sometimes deliberately, for results closer to the behavior
of the host GPU) assume that the conversion works like in Direct3D 10+,
however the Xenos supports only up to 4 subpixel bits rather than 8. The
effects of this difference are largely negligible, though.

Also add more detailed info about register references and differences from
other ATI/AMD GPUs for potential future contributors.
2025-08-06 13:21:19 +03:00
guccigang420 9ae3a72500 [CPU/HIR] Fixed MulHi in value.cc for Linux systems 2025-07-30 23:47:17 +03:00
Wunkolo a8b9cd8e65 [a64] Implement support for large stack sizes
The `SUB` instruction can only encode immediates in the form of `0xFFF`
or `0xFFF000`. In the case that the stack size is greater than `0xFFF`,
then just align the stack-size by `0x1000` to keep the bottom 12 bits
clear.
2024-06-23 14:40:52 -07:00
Wunkolo 9c572c3937 [a64] Remove redundant OPCODE_DOT_PRODUCT_{3,4} lane-isolation
The last `FADDP` writes into an `S` register, which automatically masks all the other lanes to zero.
2024-06-23 14:00:27 -07:00
Wunkolo 9c8b0678a5 [a64] Optimize OPCODE_SPLAT with MOVI/FMOV
Moves the `FMOV` constant functions into `a64_util` so it is available to other translation units. Optimize constant-splats with conditional use of `MOVI` and `FMOV`.
2024-06-23 14:00:27 -07:00
Wunkolo 539a03d5f6 [a64] Optimize OPCODE_SPLAT byte-constants
Byte-sized constants can utilize the `MOVI` instructions. This makes
many cases such as zero-splats much faster since this encodes as just a
register-rename(similar to `xor` on x64).
2024-06-23 14:00:27 -07:00
Wunkolo 3acd0a3c37 [a64] Replace instances of MOV+DUP-splats to MOVI`
These `MOV`->`DUP` splats can just be a singular `MOVI` instruction
2024-06-23 14:00:27 -07:00
Wunkolo 2953e2e6fc [a64] Use VectorCodeGenerator rather than CodeBlock+CodeGenerator
The emitter doesn't actually hold onto executable code, but just
generates the assembly-data into a buffer for the currently-resolving
function before placing it into a code-cache. When code gets pushed into
the code-cache, it can just be copied from an `std::vector` and reset.
The code-cache itself maintains the actual executable memory and
stack-unwinding code and such.

This also fixes a bunch of errornous relative-addressing glitches where
relative addresses were calculated based on the address of the unused
CodeBlock rather than being position-independent. `MOVP2R` in particular
was generating different instructions depending on its distance from the
code block when it should always just use `MOV` and not do any
relative-address calculations since we can't predict where the actual
instruction's offset will be(we cannot predict what the program counter
will be). Oaknut probably needs a "position independent" policy or mode
or something so that it avoids PC-relative instructions.
2024-06-23 14:00:27 -07:00
Wunkolo 02edbd264d [a64] Fix out-of-bounds OPCODE_VECTOR_SHL(all-same) case
Out-of-bound shift-values are handled as modulo-element-size
2024-06-23 14:00:27 -07:00
Wunkolo 1127fd9525 [a64] Implement OPCODE_CACHE_CONTROL
`dc civac` causes an illegal-instruciton on Windows-ARM. This is likely
as a security measure against cache-attacks. On Linux this instruction
is trapped into an EL1 kernel function. Windows does not seem to have
any user-mode cache-maintenance instructions available for
data-cache(only instruction-cache via `FlushInstructionCache`).

The closest thing we can do for now is a full data memory-barrier with
`dsb ish`.

Prefetches are implemented using `prfm pldl1keep, ...`.
2024-06-23 14:00:27 -07:00
Wunkolo 164f1e4fcc [a64] Remove x64 reference implementations
Removes all comments relating to x64 implementation details
2024-06-23 14:00:27 -07:00
Wunkolo 151700d830 [a64] Implement armv8.0 atomic operations
Uses LSE when available, but provides an armv8.0 baseline implementation.
2024-06-23 14:00:27 -07:00
Wunkolo 4655bc1633 [a64] Optimize constant-loads with FMOV
`FMOV` encodes an 8-bit floating point immediate that can be used to
accelerate the loading of certain constant floating point values between
-31.0 and 32.0. A lot of immediates such as -1.0, 1.0, 0.5, etc fall
within this range and this code gets lots of hits in my testing. This is
much more optimal than trying to load a 32/64-bit value in W0/X0 and
moving it into an FP register.
2024-06-23 14:00:26 -07:00
Wunkolo 8f6c0ad985 [a64] Detect MOVI utilizations for vector-element splats(u8,u16,u32)
The 64-bit cases uses a particular Replicated 8-bit immediate so
something else will have to handle that  This cases a lot of cases
without having to touch memory. Does not catch cases of
`1.0`(0x3f800000).
2024-06-23 14:00:26 -07:00
Wunkolo f830f790d1 [a64] Implement OPCODE_DID_SATURATE
This directly maps to the QC bit in the FPSR. Just have to make sure
that the saturated instruction is the very last instruction(which is
currently the case for stuff like VECTOR_ADD and such).
2024-06-23 14:00:26 -07:00
Wunkolo 818a77356e [a64] Optimize zero MovMem64
Read direction from the ZR in the case that we are just storing a 64 or 32 bit zero
2024-06-23 14:00:26 -07:00
Wunkolo 7b9f791cab [a64] Add arch-agnostic documentation configurations
Misses some during the first pass. Now the config files with mention a64 differences.
2024-06-23 14:00:26 -07:00
Wunkolo cba92a2e6e [a64] Remove VOne constant in favor of FMOV 2024-06-23 14:00:26 -07:00
Wunkolo 3b1a696dd6 [a64] Implement raw clock source
Uses `CNTFRQ` and `CNTVCT` system-registers as a raw clock source.

On my ThinkPad x13s, the raw clock source returns a tick-frequency of
19,200,000 while the platform clock source(QueryPerformanceFrequency)
returns 10,000,000. Almost double the accuracy over the platform-clock!
2024-06-23 14:00:26 -07:00
Wunkolo 63f31d5741 [a64] Fix OPCODE_SWIZZLE register-aliasing
Indices and non-const tables were using the same scratch-register
2024-06-23 14:00:26 -07:00
Wunkolo bf12583b9e [a64] Optimize constant vector byte-splats
Detect when all bytes are repeating and use `MOVI` when applicable
2024-06-23 14:00:26 -07:00
Wunkolo fc1a13d3b2 [a64] Optimize bulk VConst access with relative addressing
Load the pointer to the VConst table once, and use offsets from this base address from the underlying enum value.
Reduces the amount of instructions for each VConst memory load.
2024-06-23 14:00:26 -07:00
Wunkolo 4ff43ae1a8 [a64] Fix OPCODE_PACK(short)
Narrow-saturation instructions causes off-by-one rounding errors.
Using the min+max+shuffle passes more unit tests
2024-06-23 14:00:26 -07:00
Wunkolo 2d72b40af2 [a64] Optimize OPCODE_{UN}PACK(float16) with F16C 2024-06-23 14:00:26 -07:00
Wunkolo 06daedf077 [a64] Implement LSE and FP16C detection
Adds two new flags for allowing the use of LSE and FP16C
2024-06-23 14:00:26 -07:00
Wunkolo 96d444da9c [a64] Implement OPCODE_UNPACK
This is a very literal translation from the x64 code into ARM and may not be very optimized. Passes unit test save for a couple off-by-one errors.
2024-06-23 14:00:26 -07:00
Wunkolo 6478623d47 [a64] Fix OPCODE_PACK saturation edge-cases
Passes cpu-ppc-tests
2024-06-23 14:00:26 -07:00
Wunkolo 40d908b596 [a64] Implement OPCODE_PACK(2101010, 4202020, 8-in-16, 16-in-32) 2024-06-23 14:00:26 -07:00
Wunkolo 7c094dc6cf [a64] Implement OPCODE_LOAD_CLOCk clock_source_raw
Uses the `CNTVCT_EL0`-register and applies frequency scaling
2024-06-23 14:00:26 -07:00
Wunkolo 9b5a690706 [a64] Optimize OPCODE_MEMSET
Use pair-stores rather than singular-stores to write 32-bytes of data at a time.
2024-06-23 14:00:26 -07:00
Wunkolo 6e2910b25e [a64] Optimize memory-address calculation
The LSL can be embedded into the ADD to remove an additional instruction.
What was `cset`+`lsl`+`add` should now just be `cset`+`add ... LSL 12`
2024-06-23 14:00:26 -07:00
Wunkolo e2d1e5d7f8 [a64] Optimize vector-constant generation
Uses MOVI to optimize some cases of constants rather than EOR.
MOVI is a register-renaming idiom on many architectures.
2024-06-23 14:00:26 -07:00
Wunkolo a7ae117c90 [a64] Implement b bl br blr cbnz cbz instruction-stepping 2024-06-23 14:00:26 -07:00
Wunkolo c3efaaa286 [a64] Implement instruction stepping.
Uses `0x0000'dead` as an instructon-stepping sentinel value.
Support for basic jumping instructions like `b`, `bl`, `br`, and `blr`.
2024-06-23 14:00:26 -07:00
Wunkolo f7bd0c89a3 [a64] Implement guest-debugger stalk-walks 2024-06-23 14:00:26 -07:00
Wunkolo eb0736eb25 [a64] Reduce function prolog/epilog to 16 bytes
Just need to store `fp` and `lr`
2024-06-23 14:00:26 -07:00
Wunkolo a54226578e [a64] Implement memory tracing 2024-06-23 14:00:26 -07:00
Wunkolo f1235be462 [a64] Fix ATOMIC_COMPARE_EXCHANGE_I32 comparison type
This fixes 32-bit atomic-compare-exchanges.
The upper-half of the input register _must_ be clipped off.

This fixes a deadlock in some games.
2024-06-23 14:00:25 -07:00
Wunkolo c33f543503 [a64] Implement kDebugInfoTraceFunctions and kDebugInfoTraceFunctionCoverage
Relies on armv8.1-a atomic features
2024-06-23 14:00:25 -07:00
Wunkolo bec248c2f8 [a64] Fix OPCODE_CNTLZ
8 and 16 bit CNTLZ needs its bit-count fixed to its original element-type
2024-06-23 14:00:25 -07:00
Wunkolo b9d0752b40 [a64] Optimize OPCODE_MUL_ADD
Use `FMADD` and `FMLA`
Tests are the same, though now it should run a bit faster.
The tests that fail are primarily denormals and other subtle precision
issues it seems.

Ex:
```
i> 00002358   - vmaddfp_7298_GEN
!> 00002358 Register v4 assert failed:
!> 00002358   Expected: v4 == [00000000, 00000000, 00000000, 00000000]
!> 00002358     Actual: v4 == [000D000E, 00138014, 000E4CDC, 0018B34D]
!> 00002358     TEST FAILED
```

Host-To-Guest and Guest-To-Host thunks should probably restore/preserve
the FPCR to maintain these roundings.
2024-06-23 14:00:25 -07:00
Wunkolo 684904c487 [a64] Implement PERMUTE_V128(int16)
Passes 'vmrghh' and `vmrglh` unit-tests
2024-06-23 14:00:25 -07:00
Wunkolo 7eca228027 [a64] Fix VECTOR_CONVERT_F2I rounding
```
4.2.2.4 Floating-Point Rounding and Conversion Instructions
...
Floating-point conversions to integers (vctuxs, vctsxs) use round-toward-zero (truncate).
...
```

This passes all of the `vctuxs` and `vctsxs` unit tests
2024-06-23 14:00:25 -07:00