Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offload TLS negotiation to I/O threads #1338

Merged
merged 7 commits into from
Dec 18, 2024

Conversation

uriyage
Copy link
Contributor

@uriyage uriyage commented Nov 21, 2024

TLS Negotiation Offloading to I/O Threads

Overview

This PR introduces the ability to offload TLS handshake negotiations to I/O threads, significantly improving performance under high TLS connection loads.

Key Changes

  • Added infrastructure to offload TLS negotiations to I/O threads
  • Refactored SSL event handling to allow I/O threads modify conn flags.
  • Introduced new connection flag to identify client connections

Performance Impact

Testing with 650 clients with SET commands and 160 new TLS connections per second in the background:

Throughput Impact of new TLS connections

  • With Offloading: Minimal impact (1050K → 990K ops/sec)
  • Without Offloading: Significant drop (1050K → 670K ops/sec)

New Connection Rate

  • With Offloading:
    • 1,757 conn/sec
  • Without Offloading:
    • 477 conn/sec

Implementation Details

  1. Main Thread:

    • Initiates negotiation-offload jobs to I/O threads
    • Adds connections to pending-read clients list (using existing read offload mechanism)
    • Post-negotiation handling:
      • Creates read/write events if needed for incomplete negotiations
      • Calls accept handler for completed negotiations
  2. I/O Thread:

    • Performs TLS negotiation
    • Updates connection flags based on negotiation result

Related issue:#761

Copy link

codecov bot commented Nov 21, 2024

Codecov Report

Attention: Patch coverage is 12.50000% with 21 lines in your changes missing coverage. Please review.

Project coverage is 70.78%. Comparing base (3d0c834) to head (cea9d4a).
Report is 81 commits behind head on unstable.

Files with missing lines Patch % Lines
src/io_threads.c 0.00% 21 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1338      +/-   ##
============================================
+ Coverage     70.71%   70.78%   +0.06%     
============================================
  Files           115      119       +4     
  Lines         63159    64715    +1556     
============================================
+ Hits          44666    45806    +1140     
- Misses        18493    18909     +416     
Files with missing lines Coverage Δ
src/connection.h 87.50% <ø> (+0.29%) ⬆️
src/networking.c 88.23% <100.00%> (-0.45%) ⬇️
src/server.c 87.46% <100.00%> (-0.17%) ⬇️
src/server.h 100.00% <ø> (ø)
src/tls.c 100.00% <ø> (ø)
src/io_threads.c 6.96% <0.00%> (-0.64%) ⬇️

... and 63 files with indirect coverage changes

Copy link
Member

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation basically breaks the connection abstraction, since it has the TLS implementation calling functions related to IO threading (which is supposed to be agnostic to the connection type). I was sort of expecting we would offload the event handler.

src/io_threads.c Outdated Show resolved Hide resolved
src/connection.h Outdated Show resolved Hide resolved
src/io_threads.c Outdated
Comment on lines 577 to 592
if (server.io_threads_num <= 1) {
return C_ERR;
}

if (!(conn->flags & CONN_FLAG_CLIENT)) {
return C_ERR;
}

client *c = connGetPrivateData(conn);
if (c->io_read_state != CLIENT_IDLE) {
return C_OK;
}

if (server.active_io_threads_num <= 1) {
return C_ERR;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (server.io_threads_num <= 1) {
return C_ERR;
}
if (!(conn->flags & CONN_FLAG_CLIENT)) {
return C_ERR;
}
client *c = connGetPrivateData(conn);
if (c->io_read_state != CLIENT_IDLE) {
return C_OK;
}
if (server.active_io_threads_num <= 1) {
return C_ERR;
}
if (server.active_io_threads_num <= 1) {
return C_ERR;
}
if (!(conn->flags & CONN_FLAG_CLIENT)) {
return C_ERR;
}
client *c = connGetPrivateData(conn);
if (c->io_read_state != CLIENT_IDLE) {
return C_OK;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems everywhere else we just check for active threads.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider a scenario where the main thread sends a job to the IO thread. After the IO thread completes the job, the main thread deactivates all threads. In this case, we want to ensure the main thread processes the returned job from the IO thread before performing an accept operation even if the io threads are not active.

To achieve this:

First, check if IO threads are enabled at all (less expensive check)
Then, verify the read_state is not idle
Finally, check the active state

Unlike read/write where we anyway check for the read/write states, the accept flow operates at the connection layer rather than the client layer, so we can't check the read state there.

@madolson madolson requested a review from ranshid November 21, 2024 22:01
Signed-off-by: Uri Yagelnik <[email protected]>
@uriyage
Copy link
Contributor Author

uriyage commented Nov 22, 2024

This implementation basically breaks the connection abstraction, since it has the TLS implementation calling functions related to IO threading (which is supposed to be agnostic to the connection type). I was sort of expecting we would offload the event handler.

We still don't check in any point for the connection type, since the TLS code calls the IO threads to offload the negotiation with a supplied callback, not the other way around. Maybe we can rename it to 'accept' instead of 'tls_negotiate' to be more abstract.

I was sort of expecting we would offload the event handler.

Not sure I get this, would you please elaborate.

Copy link
Member

@ranshid ranshid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uri. We discussed offline about several architectual changes to reduce the code changes. Also placed some comments I think we can improve.

src/tls.c Outdated Show resolved Hide resolved
src/io_threads.c Outdated Show resolved Hide resolved
src/server.h Outdated Show resolved Hide resolved
src/networking.c Outdated Show resolved Hide resolved
@uriyage uriyage force-pushed the tls_negotiation_offload branch from cd86e38 to c0cf818 Compare November 27, 2024 15:22
src/tls.c Outdated Show resolved Hide resolved
src/tls.c Outdated Show resolved Hide resolved
src/tls.c Show resolved Hide resolved
@ranshid
Copy link
Member

ranshid commented Nov 28, 2024

@uriyage it looks much better now IMO. I still want @madolson to also take a better look since these parts are very fragile and hard to miss some cases. Let's also wait for her to provide feedback

src/server.c Outdated Show resolved Hide resolved
Co-authored-by: Madelyn Olson <[email protected]>
Signed-off-by: ranshid <[email protected]>
@ranshid ranshid merged commit 8060c86 into valkey-io:unstable Dec 18, 2024
50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants