Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: openthread: platform: Removed double-buffering before UART transmission. #4

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kkasperczyk-no
Copy link
Owner

…mission.

Putting data to local buffer before transmission was removed to optimize operation.

Signed-off-by: Kamil Kasperczyk [email protected]

Copy link
Collaborator

@rlubos rlubos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just make it clear in the commit message that it's safe thing to do.

.rx_ringbuf = &_name##_rx_ringbuf, \
}

OT_UART_DEFINE(ot_uart, CONFIG_OPENTHREAD_NCP_UART_RING_BUFFER_SIZE);

#define RX_FIFO_SIZE 128

static bool is_panic_mode;
static const uint8_t *sWriteBuffer = NULL;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write_buffer

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

.rx_ringbuf = &_name##_rx_ringbuf, \
}

OT_UART_DEFINE(ot_uart, CONFIG_OPENTHREAD_NCP_UART_RING_BUFFER_SIZE);

#define RX_FIFO_SIZE 128

static bool is_panic_mode;
static const uint8_t *sWriteBuffer = NULL;
static uint16_t sWriteLength = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write_length

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return OT_ERROR_FAILED;
}

sWriteBuffer = aBuf;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transmission is asynchronous, does OT guarantee that the buffer won't be violated in any way before transmission finishes (I assume so given we have otPlatUartSendDone)? If so, I'd emphasize this in the commit message.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, buffer cannot be reused until send done notification method is called. I mentioned it in commit message.


if (sWriteLength) {
for (size_t i = 0; i < len; i++) {
uart_poll_out(ot_uart.dev, sWriteBuffer+i);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Spaces around +.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

LOG_WRN("%s not implemented.", __func__);
return OT_ERROR_NOT_IMPLEMENTED;
#ifdef CONFIG_OPENTHREAD_NCP_SPINEL_ON_UART_ACM
int ret = usb_disable();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty line needed after variable definition.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Putting data to local buffer before transmission was removed to optimize operation.
Local buffering was not needed, as passed buffer cannot be modified until sending is finished.

Signed-off-by: Kamil Kasperczyk [email protected]
kkasperczyk-no pushed a commit that referenced this pull request Jul 28, 2020
This makes the gatt metrics also available for
gatt write-without-rsp-cb so it now prints the rate of each write:

uart:~$ gatt write-without-response-cb 1e ff 10 10
Write #1: 16 bytes (0 bps)
Write #2: 32 bytes (3445948416 bps)
Write #3: 48 bytes (2596929536 bps)
Write #4: 64 bytes (6400 bps)
Write #5: 80 bytes (8533 bps)
Write #6: 96 bytes (10666 bps)
Write #7: 112 bytes (8533 bps)
Write #8: 128 bytes (9955 bps)
Write zephyrproject-rtos#9: 144 bytes (11377 bps)
Write zephyrproject-rtos#10: 160 bytes (7680 bps)
Write zephyrproject-rtos#11: 176 bytes (8533 bps)
Write zephyrproject-rtos#12: 192 bytes (9386 bps)
Write Complete (err 0)
Write zephyrproject-rtos#13: 208 bytes (8533 bps)
Write zephyrproject-rtos#14: 224 bytes (9244 bps)
Write zephyrproject-rtos#15: 240 bytes (9955 bps)
Write zephyrproject-rtos#16: 256 bytes (8000 bps)

Signed-off-by: Luiz Augusto von Dentz <[email protected]>
kkasperczyk-no pushed a commit that referenced this pull request Nov 12, 2020
The _ldiv5() is an optimized divide-by-5 function that is smaller and
faster than the generic libgcc implementation.

Yet it can be made even smaller and faster with this replacement
implementation based on a reciprocal multiplication plus some tricks.

For example, here's the assembly from the original code on ARM:

_ldiv5:
        ldr     r3, [r0]
        movw    ip, zephyrproject-rtos#52429
        ldr     r1, [r0, #4]
        movt    ip, 52428
        adds    r3, r3, #2
        push    {r4, r5, r6, r7, lr}
        mov     lr, #0
        adc     r1, r1, lr
        adds    r2, lr, lr
        umull   r7, r6, ip, r1
        lsr     r6, r6, #2
        adc     r7, r6, r6
        adds    r2, r2, r2
        adc     r7, r7, r7
        adds    r2, r2, lr
        adc     r7, r7, r6
        subs    r3, r3, r2
        sbc     r7, r1, r7
        lsr     r2, r3, #3
        orr     r2, r2, r7, lsl zephyrproject-rtos#29
        umull   r2, r1, ip, r2
        lsr     r2, r1, #2
        lsr     r7, r1, zephyrproject-rtos#31
        lsl     r1, r2, #3
        adds    r4, lr, r1
        adc     r5, r6, r7
        adds    r2, r1, r1
        adds    r2, r2, r2
        adds    r2, r2, r1
        subs    r2, r3, r2
        umull   r3, r2, ip, r2
        lsr     r2, r2, #2
        adds    r4, r4, r2
        adc     r5, r5, #0
        strd    r4, [r0]
        pop     {r4, r5, r6, r7, pc}

And here's the resulting assembly with this commit applied:

_ldiv5:
        push    {r4, r5, r6, r7}
        movw    r4, zephyrproject-rtos#13107
        ldr     r6, [r0]
        movt    r4, 13107
        ldr     r1, [r0, #4]
        mov     r3, #0
        umull   r6, r7, r6, r4
        add     r2, r4, r4, lsl #1
        umull   r4, r5, r1, r4
        adds    r1, r6, r2
        adc     r2, r7, r2
        adds    ip, r6, r4
        adc     r1, r7, r5
        adds    r2, ip, r2
        adc     r2, r1, r3
        adds    r2, r4, r2
        adc     r3, r5, r3
        strd    r2, [r0]
        pop     {r4, r5, r6, r7}
        bx      lr

So we're down to 20 instructions from 36 initially, with only 2 umull
instructions instead of 3, and slightly smaller stack footprint.

Signed-off-by: Nicolas Pitre <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants