Add 3-bit VC field to VX128_2 format structure to support
instructions like vperm128 that use 4 distinct vector operands.
The VC field occupies bits 6-8 of the instruction encoding.
mcrfs: Implement Move CR from FPSCR instruction
Copies a 4-bit FPSCR field to CR and clears the FPSCR exception bits.
mffsx: Fix Rc bit handling
Properly update CR1 from FPSCR when Rc=1 instead of returning error.
Previously treated Rc=1 as unimplemented.
fcfidx: Fix Rc field access
Use i.X.Rc instead of i.A.Rc for correct instruction format.
fcfid uses X format, not A format.
Implement mcrxr (Move to Condition Register from XER).
Copies XER condition bits (SO, OV, CA) to a CR field and
clears those bits in XER. This was previously unimplemented.
Correct mfvscr and mtvscr to use VX format instead of VX128_1.
These instructions operate on the standard Altivec VSCR register,
not VMX128 extended registers. The previous VX128_1 format was
incorrectly accessing the RB field instead of VD/VB.
Add round-to-nearest-even to the fast float16 pack path by folding a
0xFFF rounding bias into XMMF16PackLCPI0 and extracting bit 13 of the
source as the tie-breaker, matching the software fallback behavior.
Fix PACK_SHORT_2 test to use pre-biased float input (0x40400000 = 3.0)
as the hardware expects, rather than raw 0.0f which is out of range.
The loop conditions `n < 8 - n` and `n < 4 - n` terminated early,
only checking the first half of elements. This caused EmitInt16 and
EmitInt32 to incorrectly take the uniform shift path when trailing
elements had different shift amounts.
Resolves potential issues in SHL, SHR, and SHA.
When performing unsigned multiplication on Linux/GCC,
the code incorrectly cast constant.i64 (which is a signed int64_t)
to unsigned __int128. This caused sign extension when
the value should be treated as unsigned.
Current implementation has an off by 1 in rounding, should
round up to even but doesn't. Need to figure out how to implement
it properly so just leaving the software version here for later
verification.
Add dcbz128 instruction with opcode X(31,1014) | (1<<21)
dcbz and dcbz128 share extended opcode 1014, distinguished by
bit 21 (RT field): dcbz has RT=0, dcbz128 has RT=1.
Fix vspltisw128 operand list to {VD128, SIMM}
Removed incorrect VB128 operand; instruction only takes
destination and immediate value.
Fix VPERM128 field definition from 8-bit to 0xff
The vperm128 permute control is an 8-bit value (0-255),
not a 3-bit value (0-7).
Fix VC128 field flags from PPC_OPERAND_VR to 0
VC128 is a 3-bit immediate field, not a vector register operand.
- For EmitInt8 there was missing check for & 7 which was causing graphical glitches in movies
- There is probably similar bug in 16/32 version, but that's for another commit