Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Short (8/16 bit) atomic RMW operations on RISCV #297

Merged
merged 1 commit into from
Nov 8, 2024

Conversation

matetokodi
Copy link
Contributor

@matetokodi matetokodi commented Oct 21, 2024

Add support for short atomic RMW operations for RISCV

Since RISCV does not support short atomic RMW operations, only 32 and 64 bit ones, we must modify the correct values inside the 32 bit values ourselves.

@matetokodi matetokodi force-pushed the jit_threads_riscv_manual branch from 8253ce1 to 4dd002b Compare October 22, 2024 00:02
if (!(operationSize & SLJIT_32) && operationSize != SLJIT_MOV32) {
compareTopFalse = sljit_emit_cmp(compiler, SLJIT_NOT_EQUAL, SLJIT_IMM, 0, srcExpectedPair.arg2, srcExpectedPair.arg2w);
}
if (noShortAtomic && size <= 2) {
sljit_emit_op2(compiler, SLJIT_AND, maskReg, 0, baseReg, 0, SLJIT_IMM, 0x3);
sljit_emit_op2(compiler, SLJIT_SHL, maskReg, 0, maskReg, 0, SLJIT_IMM, 3); // multiply by 8
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

32 bit atomic should be present, so mutiply by 4 should be enough

sljit_s32 operationSize = SLJIT_MOV;
sljit_s32 size = 0;
sljit_s32 offset = 0;
sljit_s32 operation;
uint32_t options = MemAddress::CheckNaturalAlignment | MemAddress::AbsoluteAddress;
sljit_sw stackTmpStart = CompileContext::get(compiler)->stackTmpStart;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need this newline.

case ByteCode::I32AtomicRmwAndOpcode:
case ByteCode::I32AtomicRmwOrOpcode:
case ByteCode::I32AtomicRmwXorOpcode:
case ByteCode::I32AtomicRmwXchgOpcode: {
info = Instruction::kIs32Bit;
requiredInit = OTAtomicRmwI32;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be an OTAtomicRmwShort which adds an extra tmp register when short atomic is not present. Could use the compiler option bits to check this.

@matetokodi matetokodi force-pushed the jit_threads_riscv_manual branch from 4dd002b to 28b7aeb Compare October 29, 2024 12:58
@matetokodi matetokodi changed the title WIP: Support Short (8/16 bit) atomic RMW operations on RISCV Support Short (8/16 bit) atomic RMW operations on RISCV Oct 29, 2024
maskReg = instr->requiredReg(3);
}

if (SLJIT_IS_IMM(memValueReg)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of unshared memory, the arguments are immediate zeroes

;; unshared memory is OK

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it always 0? Cannot use other immediates?

Copy link
Contributor Author

@matetokodi matetokodi Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon further investigation memValueReg is only an immediate in the RMW cases (not RMW Cmpxchg), but in my observation only in the unshared memory test cases. The only value I have seen it take has been 0, but that does not guarantee it will always be 0, but its value is not used in any case; its register is needed as a working TMP register.

But the deciding factor is not the memory being unshared, as the code works fine without the check for immediates even if I remove the shared flag from the very first module at the top of the test file for example.

However just returning in these cases is most likely incorrect; I have updated it so a TMP register is assigned in these cases instead.

JITArg memValue(operands + 0);
sljit_s32 memValueReg = SLJIT_EXTRACT_REG(memValue.arg);
sljit_s32 maskReg;
sljit_s32 shiftReg;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually these should be initialized to 0.

#if (defined SLJIT_32BIT_ARCHITECTURE && SLJIT_32BIT_ARCHITECTURE)
sljit_emit_op2(compiler, SLJIT_AND32, baseReg, 0, baseReg, 0, SLJIT_IMM, ~0x3);
#else /* !SLJIT_32BIT_ARCHITECTURE */
sljit_emit_op2(compiler, SLJIT_AND, baseReg, 0, baseReg, 0, SLJIT_IMM, ~0x3);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AND and AND32 is the same on 32 bit systems.

}

if (noShortAtomic && size <= 2) {
sljit_emit_op2(compiler, SLJIT_AND32, shiftReg, 0, baseReg, 0, SLJIT_IMM, 0x3);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't baseReg is a word sized reg?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is; would the more proper way to do this be using SLJIT_AND here, and then using an SLJIT_MOV32 from and to shiftReg?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, use an and(shiftreg, basereg, 0x3), then move32(shiftreg,shiftreg). The latter is optimized out if not necessary.


if (noShortAtomic && size <= 2) {
sljit_emit_op2(compiler, SLJIT_AND32, tmpReg, 0, SLJIT_TMP_DEST_REG, 0, maskReg, 0);
sljit_emit_op2(compiler, SLJIT_LSHR32, SLJIT_TMP_DEST_REG, 0, tmpReg, 0, shiftReg, 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these operations are reserved, and used and immedate, then maskReg can be reused, if that helps in some code above.

@zherczeg
Copy link
Collaborator

Is the code tested on RISCV?

@matetokodi matetokodi force-pushed the jit_threads_riscv_manual branch from 28b7aeb to 498e142 Compare November 4, 2024 14:23
@matetokodi
Copy link
Contributor Author

Is the code tested on RISCV?

I have now tested it on both 32 and 64 bit RISCV. (using vorosl's WIP RISCV pathes so the project compiles, because currently the base project does not compile on 32 or 64 bit RISCV.)

@matetokodi matetokodi force-pushed the jit_threads_riscv_manual branch 2 times, most recently from c7e61d1 to c16ccd8 Compare November 5, 2024 17:49
Copy link
Collaborator

@zherczeg zherczeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the status from draft to ready for review

if (noShortAtomic && size <= 2) {
sljit_emit_op2(compiler, SLJIT_AND, shiftReg, 0, baseReg, 0, SLJIT_IMM, 0x3);
sljit_emit_op1(compiler, SLJIT_MOV32, shiftReg, 0, shiftReg, 0);
sljit_emit_op1(compiler, SLJIT_MOV32, shiftReg, 0, shiftReg, 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing it once is enough.

Since RISCV does not support short atomic RMW operations, only 32 and 64
bit ones, we must modify the correct values inside the 32 bit values
ourselves.

Signed-off-by: Máté Tokodi [email protected]
@matetokodi matetokodi force-pushed the jit_threads_riscv_manual branch from c16ccd8 to bd9a5fa Compare November 6, 2024 12:20
@matetokodi matetokodi marked this pull request as ready for review November 6, 2024 12:20
Copy link
Collaborator

@zherczeg zherczeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@clover2123 clover2123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@clover2123 clover2123 merged commit 3c714e5 into Samsung:main Nov 8, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants