Commit graph

1585 commits

Author SHA1 Message Date
Lioncash e8b0f25dff A64: Implement SQSHL's vector register variant 2020-04-22 20:55:06 +01:00
Lioncash b14eaaec46 ir: Add opcodes for left signed saturated shifts 2020-04-22 20:55:06 +01:00
Lioncash da55ed7b31 branch: Make variables const where applicable 2020-04-22 20:55:06 +01:00
Lioncash 867b666285 move_wide: Make variables const where applicable 2020-04-22 20:55:06 +01:00
Lioncash 78024a9dc4 load_store_register_unprivileged: Make variables const where applicable 2020-04-22 20:55:06 +01:00
Lioncash e45e5da610 load_store_register_immediate: Place conditional bodies on their own line
Makes the conditionals visually consistent with the rest of the
codebase.
2020-04-22 20:55:06 +01:00
Lioncash b586cf3f56 load_store_load_literal: Make variables const where applicable 2020-04-22 20:55:06 +01:00
Lioncash c3a3b9687e data_processing_logical: Move datasize declarations after early-exit conditionals
While we're at it, make variables const where applicable.
2020-04-22 20:55:06 +01:00
Lioncash ed797e6540 data_processing_conditional_select: Make variables const where applicable
Makes CSEL's function consistent with all of the others.
2020-04-22 20:55:06 +01:00
Lioncash c82fa5ec5a data_processing_addsub: Move datasize declarations after early-exit conditionals
While we're at it, also make relevant variables const where applicable
2020-04-22 20:55:06 +01:00
Lioncash f4a66d2477 data_processing_bitfield: Move datasize variables after early-exit conditionals
Moves the declaration of datasize to the scope that it's used within.
This also takes the opportunity to apply const where applicable, and
make early-exits all vertically consistent with one another.
2020-04-22 20:55:06 +01:00
Lioncash 2e0fcd6161 A64: Implement CLS's vector variant
Leverages CLZ like the integral variant does.
2020-04-22 20:55:06 +01:00
Lioncash a2cd643525 emit_x64_vector: Make EmitVectorUnsignedSaturatedAccumulateSigned() internally linked
Given this is just an internal helper function, it can be marked static.
2020-04-22 20:55:06 +01:00
Lioncash c39ea2e3c9 perf_map: Use std::string_view instead of std::string for PerfMapRegister()
We can just use a non-owning view into a string in this case instead of
potentially allocating a std::string instance.
2020-04-22 20:55:06 +01:00
MerryMage 12243692f5 A64: Implement SQRDMULH (vector), vector variant 2020-04-22 20:55:06 +01:00
MerryMage a9ffcf08b1 A64: Implement SQDMULL (vector), vector variant 2020-04-22 20:55:06 +01:00
MerryMage 3e447614c6 IR: Add VectorSignedSaturatedDoublingMultiplyLong 2020-04-22 20:55:06 +01:00
MerryMage 06b31448aa emit_x64_vector: Changes to VectorSignedSaturatedDoublingMultiply
* Return both the upper and lower parts of the multiply if required
* SSE2 does not support the pmuldq instruction, do sign correction to an unsigned result instead
* Improve port utilisation where possible (punpck instructions were a bottleneck)
2020-04-22 20:55:06 +01:00
MerryMage 08c0e017a5 IR: Implement Vector{Signed,Unsigned}Multiply{16,32} 2020-04-22 20:55:06 +01:00
Lioncash b6df34cdde backend_x64/a64_interface: Re-enable the constant folding pass
This was disabled for debugging, but never re-enabled. Just to be sure,
testing was done downstream in yuzu to make sure this didn't happen to
break anything (which seems to be the case).
2020-04-22 20:55:06 +01:00
MerryMage 06ba397af2 emit_x64_vector_floating_point: Hardware FMA implementation for RSqrtStepFused 2020-04-22 20:55:06 +01:00
MerryMage e553c4fe8d emit_x64_vector_floating_point: Hardware FMA implementation of FPVectorRecipStepFused 2020-04-22 20:55:06 +01:00
MerryMage 3caeb62ef1 emit_x64_floating_point: Hardware FMA implementation of FPRSqrtStepFused 2020-04-22 20:55:06 +01:00
MerryMage 344ee76aba emit_x64_floating_point: Hardware FMA implementation of FPRecipStepFused{32,64} 2020-04-22 20:55:06 +01:00
MerryMage 1492573267 emit_x64_vector: SSE implementation of VectorSignedSaturatedAccumulateUnsigned{8,16,32} 2020-04-22 20:55:06 +01:00
Lioncash 26df6e5e7b emit_x64_vector: Correct static asserts for < 64-bit type checks in saturated accumulate fallbacks
I had initially meant to use BitSize() here, not sizeof()
2020-04-22 20:55:06 +01:00
MerryMage a4a26ac226 emit_x64_vector: EmitVectorSignedSaturatedAccumulateUnsigned64: SSE implementation 2020-04-22 20:55:06 +01:00
MerryMage a7c66d2d28 emit_x64_vector: Simplify fpsr_qc related code
Move the bool conversion into A64JitState::GetFpsr so we don't have to continuously
pay the cost of conversion for every saturation instruction.
2020-04-22 20:55:06 +01:00
Lioncash 112cff9ab9 A64: Implement CLZ's vector variant 2020-04-22 20:55:06 +01:00
Lioncash e739624296 ir: Add opcodes for vector CLZ operations
We can optimize these cases further for with the use of a fair bit of
shuffling via pshufb and the use of masks, but given the uncommon use of
this instruction, I wouldn't consider it to be beneficial in terms of
amount of code to be worth it over a simple manageable naive solution
like this.

If we ever do hit a case where vectorized CLZ happens to be a
bottleneck, then we can revisit this. At least with AVX-512CD, this can
be done with a single instruction for the 32-bit word case.
2020-04-22 20:55:05 +01:00
MerryMage d4c37a68a8 A64/translate: VectorZeroUpper for V(64) stores
Ensures correctness.
2020-04-22 20:55:05 +01:00
MerryMage b8daa4feac simd_two_register_misc: FNEG (vector) with Q == 0 had dirty upper 2020-04-22 20:55:05 +01:00
Lioncash 5653e7637e emit_x64_vector: Remove unnecessary [[maybe_unused]] attributes
These were unintentionally left in when introducing SUQADD and USQADD
2020-04-22 20:55:05 +01:00
Lioncash 14e026a7f0 A64: Implement USQADD's scalar and vector variants 2020-04-22 20:55:05 +01:00
Lioncash d4a76aaa04 ir: Add opcodes form unsigned saturated accumulations of signed values 2020-04-22 20:55:05 +01:00
Lioncash 18ad7f237d A64: Implement SUQADD's scalar and vector variants 2020-04-22 20:55:05 +01:00
Lioncash 6f911a26da ir: Add opcodes for signed saturated accumulations of unsigned values 2020-04-22 20:55:05 +01:00
Lioncash 9a3d38d2ee A64: Implement SMLAL{2}, SMLSL{2}, UMLAL{2}, and UMLSL{2}'s vector by-element variants
We can simply modify the general function made for SMULL{2} and
UMULL{2}'s by-element variants to also handle the other multiply-based
by-element variants.
2020-04-22 20:55:05 +01:00
Lioncash 6ccfbc9b39 A64: Implement UMULL{2}'s vector by-element variant 2020-04-22 20:55:05 +01:00
Lioncash 58e21f175c A64: Implement SMULL{2}'s vector by-element variant 2020-04-22 20:55:05 +01:00
Lioncash 134bb02e19 ir/value: Replace includes with forward declarations
enum classes are still considered complete types when forward declared
(as the compiler knows the exact size of the type from the declaration
alone). The only difference in this case being that the members of the
enum class aren't visible. Given we don't use the members within this
header in any way, we can simply forward declare them here and remove
the inclusions.
2020-04-22 20:55:05 +01:00
Lioncash 2c8e07e7d0 ir/cond: Migrate to C++17 nested namespace specifiers 2020-04-22 20:55:05 +01:00
Lioncash c3b7819a55 CMakeLists: Add missing cond.h header to file listing
Allows the file to show up within IDEs more easily.
2020-04-22 20:55:05 +01:00
Lioncash 0a3976059f A64: Implement URSQRTE 2020-04-22 20:55:05 +01:00
Lioncash b6e74fd17d ir: Add opcodes for performing unsigned reciprocal square root estimates 2020-04-22 20:55:05 +01:00
Lioncash bd3582e811 A64: Implement URECPE 2020-04-22 20:55:05 +01:00
Lioncash af83360f89 ir: Add opcodes for unsigned reciprocal estimate 2020-04-22 20:55:05 +01:00
Lioncash d46dea136f travis: Make macOS build with Xcode 9.4.1
Builds against the latest release version of the Xcode toolchain
2020-04-22 20:53:46 +01:00
Lioncash 740ffa52ae A64: Implement SQNEG's scalar and vector variant 2020-04-22 20:53:46 +01:00
Lioncash fca7eddb9e A64: Add opcodes for signed saturating negations 2020-04-22 20:53:46 +01:00