xenia

mirror of https://github.com/xenia-project/xenia.git synced 2025-12-06 07:12:03 +01:00

Author	SHA1	Message	Date
Wunk	551fe52068	Merge `a8b9cd8e65` into `01ae24e46e`	2025-08-21 14:17:05 +10:00
guccigang420	01ae24e46e	[Base/Memory] Fix VirtualQuery length parameter	2025-08-20 13:34:39 +03:00
Triang3l	0b2ffa3148	[GPU] Change texture load cbuffer to push constants Simplify the code, eliminating the need for supporting requesting cbuffers for anything other than guest draw command execution.	2025-08-20 12:46:26 +03:00
Triang3l	04d5c40d0d	[GPU/UI] XeSL readability improvements + float suffix Use the _xe suffix instead of the xesl_ prefix for quicker visual recognition of identifiers, also switch to snake_case for consistency. Also add the f suffix to float32 literals because the Metal Shading Language is based on C++.	2025-08-19 21:36:06 +03:00
Triang3l	3b4b04c371	[Build] Locate FXC among Windows SDK architectures and versions	2025-08-19 20:48:26 +03:00
Triang3l	4234440681	[Vulkan] Fix VulkanInstance::Create return values	2025-08-15 17:19:23 +03:00
Triang3l	b5432ab83f	[Vulkan] Refactoring and fixes for VulkanProvider and related areas Enable portability subset physical device enumeration. Don't use Vulkan 1.1+ logical devices on Vulkan 1.0 instances due to the VkApplicationInfo::apiVersion specification. Make sure all extension dependencies are enabled when creating a device. Prefer exposing feature support over extension support via the device interface to avoid causing confusion with regard to promoted extensions (especially those that required some features as extensions, but had those features made optional when they were promoted). Allow creating presentation-only devices, not demanding any optional features beyond the basic Vulkan 1.0, for use cases such as internal tools or CPU rendering. Require the independentBlend feature for GPU emulation as working around is complicated, while support is almost ubiquitous. Move the graphics system initialization fatal error message to xenia_main after attempting to initialize all implementations, for automatic fallback to other implementations in the future. Log Vulkan driver info. Improve Vulkan debug message logging, enabled by default. Refactor code, with simplified logic for enabling extensions and layers.	2025-08-14 23:44:21 +03:00
Triang3l	a06be03f1b	[GPU] Cleanup definitions of some registers VS/PS_NUM_REG is 6-bit on Adreno 200, and games aren't seen using the bit 7 to indicate that no GPRs are used. It's not clear why Freedreno configures it this way. Some texture fetch fields were deprecated or moved during the development of the Xenos, reflect that in the comments. Add definitions of the registers configuring the conversion of vertex positions to fixed-point. Although there isn't much that can be done with it when emulating using PC GPU APIs, there are some places in Xenia that wrongly (though sometimes deliberately, for results closer to the behavior of the host GPU) assume that the conversion works like in Direct3D 10+, however the Xenos supports only up to 4 subpixel bits rather than 8. The effects of this difference are largely negligible, though. Also add more detailed info about register references and differences from other ATI/AMD GPUs for potential future contributors.	2025-08-06 13:21:19 +03:00
guccigang420	9ae3a72500	[CPU/HIR] Fixed MulHi in value.cc for Linux systems	2025-07-30 23:47:17 +03:00
Wunkolo	a8b9cd8e65	[a64] Implement support for large stack sizes The `SUB` instruction can only encode immediates in the form of `0xFFF` or `0xFFF000`. In the case that the stack size is greater than `0xFFF`, then just align the stack-size by `0x1000` to keep the bottom 12 bits clear.	2024-06-23 14:40:52 -07:00
Wunkolo	9c572c3937	[a64] Remove redundant `OPCODE_DOT_PRODUCT_{3,4}` lane-isolation The last `FADDP` writes into an `S` register, which automatically masks all the other lanes to zero.	2024-06-23 14:00:27 -07:00
Wunkolo	9c8b0678a5	[a64] Optimize `OPCODE_SPLAT` with `MOVI`/`FMOV` Moves the `FMOV` constant functions into `a64_util` so it is available to other translation units. Optimize constant-splats with conditional use of `MOVI` and `FMOV`.	2024-06-23 14:00:27 -07:00
Wunkolo	539a03d5f6	[a64] Optimize `OPCODE_SPLAT` byte-constants Byte-sized constants can utilize the `MOVI` instructions. This makes many cases such as zero-splats much faster since this encodes as just a register-rename(similar to `xor` on x64).	2024-06-23 14:00:27 -07:00
Wunkolo	3acd0a3c37	[a64] Replace instances of `MOV`+`DUP-splats to` MOVI` These `MOV`->`DUP` splats can just be a singular `MOVI` instruction	2024-06-23 14:00:27 -07:00
Wunkolo	2953e2e6fc	[a64] Use VectorCodeGenerator rather than CodeBlock+CodeGenerator The emitter doesn't actually hold onto executable code, but just generates the assembly-data into a buffer for the currently-resolving function before placing it into a code-cache. When code gets pushed into the code-cache, it can just be copied from an `std::vector` and reset. The code-cache itself maintains the actual executable memory and stack-unwinding code and such. This also fixes a bunch of errornous relative-addressing glitches where relative addresses were calculated based on the address of the unused CodeBlock rather than being position-independent. `MOVP2R` in particular was generating different instructions depending on its distance from the code block when it should always just use `MOV` and not do any relative-address calculations since we can't predict where the actual instruction's offset will be(we cannot predict what the program counter will be). Oaknut probably needs a "position independent" policy or mode or something so that it avoids PC-relative instructions.	2024-06-23 14:00:27 -07:00
Wunkolo	02edbd264d	[a64] Fix out-of-bounds `OPCODE_VECTOR_SHL`(all-same) case Out-of-bound shift-values are handled as modulo-element-size	2024-06-23 14:00:27 -07:00
Wunkolo	1127fd9525	[a64] Implement `OPCODE_CACHE_CONTROL` `dc civac` causes an illegal-instruciton on Windows-ARM. This is likely as a security measure against cache-attacks. On Linux this instruction is trapped into an EL1 kernel function. Windows does not seem to have any user-mode cache-maintenance instructions available for data-cache(only instruction-cache via `FlushInstructionCache`). The closest thing we can do for now is a full data memory-barrier with `dsb ish`. Prefetches are implemented using `prfm pldl1keep, ...`.	2024-06-23 14:00:27 -07:00
Wunkolo	164f1e4fcc	[a64] Remove x64 reference implementations Removes all comments relating to x64 implementation details	2024-06-23 14:00:27 -07:00
Wunkolo	151700d830	[a64] Implement armv8.0 atomic operations Uses LSE when available, but provides an armv8.0 baseline implementation.	2024-06-23 14:00:27 -07:00
Wunkolo	4655bc1633	[a64] Optimize constant-loads with `FMOV` `FMOV` encodes an 8-bit floating point immediate that can be used to accelerate the loading of certain constant floating point values between -31.0 and 32.0. A lot of immediates such as -1.0, 1.0, 0.5, etc fall within this range and this code gets lots of hits in my testing. This is much more optimal than trying to load a 32/64-bit value in W0/X0 and moving it into an FP register.	2024-06-23 14:00:26 -07:00
Wunkolo	8f6c0ad985	[a64] Detect `MOVI` utilizations for vector-element splats(u8,u16,u32) The 64-bit cases uses a particular Replicated 8-bit immediate so something else will have to handle that This cases a lot of cases without having to touch memory. Does not catch cases of `1.0`(0x3f800000).	2024-06-23 14:00:26 -07:00
Wunkolo	f830f790d1	[a64] Implement `OPCODE_DID_SATURATE` This directly maps to the QC bit in the FPSR. Just have to make sure that the saturated instruction is the very last instruction(which is currently the case for stuff like VECTOR_ADD and such).	2024-06-23 14:00:26 -07:00
Wunkolo	818a77356e	[a64] Optimize zero MovMem64 Read direction from the ZR in the case that we are just storing a 64 or 32 bit zero	2024-06-23 14:00:26 -07:00
Wunkolo	7b9f791cab	[a64] Add arch-agnostic documentation configurations Misses some during the first pass. Now the config files with mention a64 differences.	2024-06-23 14:00:26 -07:00
Wunkolo	cba92a2e6e	[a64] Remove `VOne` constant in favor of `FMOV`	2024-06-23 14:00:26 -07:00
Wunkolo	3b1a696dd6	[a64] Implement raw clock source Uses `CNTFRQ` and `CNTVCT` system-registers as a raw clock source. On my ThinkPad x13s, the raw clock source returns a tick-frequency of 19,200,000 while the platform clock source(QueryPerformanceFrequency) returns 10,000,000. Almost double the accuracy over the platform-clock!	2024-06-23 14:00:26 -07:00
Wunkolo	63f31d5741	[a64] Fix `OPCODE_SWIZZLE` register-aliasing Indices and non-const tables were using the same scratch-register	2024-06-23 14:00:26 -07:00
Wunkolo	bf12583b9e	[a64] Optimize constant vector byte-splats Detect when all bytes are repeating and use `MOVI` when applicable	2024-06-23 14:00:26 -07:00
Wunkolo	fc1a13d3b2	[a64] Optimize bulk VConst access with relative addressing Load the pointer to the VConst table once, and use offsets from this base address from the underlying enum value. Reduces the amount of instructions for each VConst memory load.	2024-06-23 14:00:26 -07:00
Wunkolo	4ff43ae1a8	[a64] Fix `OPCODE_PACK`(short) Narrow-saturation instructions causes off-by-one rounding errors. Using the min+max+shuffle passes more unit tests	2024-06-23 14:00:26 -07:00
Wunkolo	2d72b40af2	[a64] Optimize `OPCODE_{UN}PACK`(float16) with `F16C`	2024-06-23 14:00:26 -07:00
Wunkolo	06daedf077	[a64] Implement `LSE` and `FP16C` detection Adds two new flags for allowing the use of LSE and FP16C	2024-06-23 14:00:26 -07:00
Wunkolo	96d444da9c	[a64] Implement `OPCODE_UNPACK` This is a very literal translation from the x64 code into ARM and may not be very optimized. Passes unit test save for a couple off-by-one errors.	2024-06-23 14:00:26 -07:00
Wunkolo	6478623d47	[a64] Fix `OPCODE_PACK` saturation edge-cases Passes cpu-ppc-tests	2024-06-23 14:00:26 -07:00
Wunkolo	40d908b596	[a64] Implement `OPCODE_PACK`(2101010, 4202020, 8-in-16, 16-in-32)	2024-06-23 14:00:26 -07:00
Wunkolo	7c094dc6cf	[a64] Implement `OPCODE_LOAD_CLOCk` `clock_source_raw` Uses the `CNTVCT_EL0`-register and applies frequency scaling	2024-06-23 14:00:26 -07:00
Wunkolo	9b5a690706	[a64] Optimize `OPCODE_MEMSET` Use pair-stores rather than singular-stores to write 32-bytes of data at a time.	2024-06-23 14:00:26 -07:00
Wunkolo	6e2910b25e	[a64] Optimize memory-address calculation The LSL can be embedded into the ADD to remove an additional instruction. What was `cset`+`lsl`+`add` should now just be `cset`+`add ... LSL 12`	2024-06-23 14:00:26 -07:00
Wunkolo	e2d1e5d7f8	[a64] Optimize vector-constant generation Uses MOVI to optimize some cases of constants rather than EOR. MOVI is a register-renaming idiom on many architectures.	2024-06-23 14:00:26 -07:00
Wunkolo	a7ae117c90	[a64] Implement `b` `bl` `br` `blr` `cbnz` `cbz` instruction-stepping	2024-06-23 14:00:26 -07:00
Wunkolo	c3efaaa286	[a64] Implement instruction stepping. Uses `0x0000'dead` as an instructon-stepping sentinel value. Support for basic jumping instructions like `b`, `bl`, `br`, and `blr`.	2024-06-23 14:00:26 -07:00
Wunkolo	f7bd0c89a3	[a64] Implement guest-debugger stalk-walks	2024-06-23 14:00:26 -07:00
Wunkolo	eb0736eb25	[a64] Reduce function prolog/epilog to 16 bytes Just need to store `fp` and `lr`	2024-06-23 14:00:26 -07:00
Wunkolo	a54226578e	[a64] Implement memory tracing	2024-06-23 14:00:26 -07:00
Wunkolo	f1235be462	[a64] Fix `ATOMIC_COMPARE_EXCHANGE_I32` comparison type This fixes 32-bit atomic-compare-exchanges. The upper-half of the input register _must_ be clipped off. This fixes a deadlock in some games.	2024-06-23 14:00:25 -07:00
Wunkolo	c33f543503	[a64] Implement `kDebugInfoTraceFunctions` and `kDebugInfoTraceFunctionCoverage` Relies on armv8.1-a atomic features	2024-06-23 14:00:25 -07:00
Wunkolo	bec248c2f8	[a64] Fix `OPCODE_CNTLZ` 8 and 16 bit CNTLZ needs its bit-count fixed to its original element-type	2024-06-23 14:00:25 -07:00
Wunkolo	b9d0752b40	[a64] Optimize `OPCODE_MUL_ADD` Use `FMADD` and `FMLA` Tests are the same, though now it should run a bit faster. The tests that fail are primarily denormals and other subtle precision issues it seems. Ex: ``` i> 00002358 - vmaddfp_7298_GEN !> 00002358 Register v4 assert failed: !> 00002358 Expected: v4 == [00000000, 00000000, 00000000, 00000000] !> 00002358 Actual: v4 == [000D000E, 00138014, 000E4CDC, 0018B34D] !> 00002358 TEST FAILED ``` Host-To-Guest and Guest-To-Host thunks should probably restore/preserve the FPCR to maintain these roundings.	2024-06-23 14:00:25 -07:00
Wunkolo	684904c487	[a64] Implement `PERMUTE_V128`(int16) Passes 'vmrghh' and `vmrglh` unit-tests	2024-06-23 14:00:25 -07:00
Wunkolo	7eca228027	[a64] Fix `VECTOR_CONVERT_F2I` rounding ``` 4.2.2.4 Floating-Point Rounding and Conversion Instructions ... Floating-point conversions to integers (vctuxs, vctsxs) use round-toward-zero (truncate). ... ``` This passes all of the `vctuxs` and `vctsxs` unit tests	2024-06-23 14:00:25 -07:00

1 2 3 4 5 ...

7103 commits