-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cassandra snapshot store #532
base: master
Are you sure you want to change the base?
Conversation
def listOf(size: Int): List[BufferNr] = | ||
(0 until size).toList.map(fromIntUnsafe) | ||
|
||
def fromIntUnsafe(value: Int): BufferNr = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- some validation?
- non negative?
- where is safe version? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -34,10 +34,17 @@ object CassandraHelper { | |||
} | |||
|
|||
|
|||
implicit class RowOps(val self: Row) extends AnyVal { | |||
|
|||
def wasApplied: Boolean = self.getBool("[applied]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isApplied
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I named it wasApplied
, because it is called like that in Cassandra driver, so I thought it might ease discoverability a bit. But isApplied
sounds fine too.
https://docs.datastax.com/en/drivers/java/3.0/com/datastax/driver/core/ResultSet.html#wasApplied--
@@ -12,6 +12,7 @@ final case class SchemaConfig( | |||
metaJournalTable: String = "metajournal", | |||
pointerTable: String = "pointer", | |||
pointer2Table: String = "pointer2", | |||
snapshotTable: String = "snapshot", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a note, akka currently uses snapshots
table. We need to be 100% to not conflict with it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not think about it, to be honest. Other name is required here, indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about "snapshot_buffer"?
.../src/main/scala/com/evolutiongaming/kafka/journal/eventual/cassandra/SnapshotCassandra.scala
Outdated
Show resolved
Hide resolved
|
||
import java.time.Instant | ||
|
||
final case class SnapshotSelectionCriteria( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It provides a bit of additional typesafety on top of the original structure.
I can move these parameters to the method arguments though.
Here is the only place it is used: https://github.com/evolution-gaming/kafka-journal/pull/532/files#diff-9ccd7e724e3622d863ea66a0e6c83a3d5bf96a6d0d089ad1235680ca6266556aR114
...src/main/scala/com/evolutiongaming/kafka/journal/eventual/cassandra/SnapshotStatements.scala
Outdated
Show resolved
Hide resolved
val (bufferNr, (deleteSnapshot, _)) = oldestSnapshot | ||
val wasApplied = statements.updateRecord(key, segmentNr, bufferNr, insertSnapshot, deleteSnapshot) | ||
wasApplied.flatMap { wasApplied => | ||
// TODO: consider adding circuit breaker here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please explain this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is a bug or high contention (with the writer itself?), this call may lead to an infinite loop, which feels like a thing to consider protecting against.
val segmentNr = SegmentNr.min | ||
|
||
def save(key: Key, snapshot: SnapshotRecord[EventualPayloadAndType]): F[Unit] = { | ||
statements.selectMetadata(key, segmentNr).flatMap { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on the hot path given that buffer is full, do you need to query ALL snapshots?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need this information to find the oldest snapshot to be replaced. There are might be the ways to optimize it, though. 🤔
(bufferNr, (seqNr, timestamp)) | ||
} | ||
|
||
rows.toList.map(_.toMap) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor, not sure toMap
should be part of statement
:)
savedSnapshots: Map[BufferNr, (SeqNr, Instant)], | ||
insertSnapshot: SnapshotRecord[EventualPayloadAndType] | ||
): F[Unit] = { | ||
val sortedSnapshots = savedSnapshots.toList.sortBy { case (_, (seqNr, timestamp)) => (seqNr, timestamp) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to sort collection, just minBy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it correct to include timestamp
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, it should never happen, i.e. there should not be two snapshots with the same seqNr
. 🤔
7af15e7
to
6a6da8e
Compare
611ff72
to
05e761a
Compare
40f0178
to
e46f1ed
Compare
64b3b10
to
70adbd5
Compare
snapshot/src/main/scala/com/evolutiongaming/kafka/journal/BufferNr.scala
Outdated
Show resolved
Hide resolved
.../src/main/scala/com/evolutiongaming/kafka/journal/snapshot/cassandra/SnapshotCassandra.scala
Show resolved
Hide resolved
.../src/main/scala/com/evolutiongaming/kafka/journal/snapshot/cassandra/SnapshotCassandra.scala
Show resolved
Hide resolved
savedSnapshots: Map[BufferNr, (SeqNr, Instant)], | ||
insertSnapshot: SnapshotRecord[EventualPayloadAndType] | ||
): F[Unit] = { | ||
val oldestSnapshot = savedSnapshots.toList.minByOption { case (_, (seqNr, timestamp)) => (seqNr, timestamp) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you expecting to have few snapshots with same seqNr
? I do not understand use-case for it, could you please share more details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should never have several snapshots with the same seqNr
, but it seemed like a good idea to be a little bit defensive here.
I.e., if there is a bug or something, it will still filter by timestamp
and delete an oldest one, not a random one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to log (or report any other way) the case of duplicated seqNr
? Please resolve the conversation if its not easily possible
.../src/main/scala/com/evolutiongaming/kafka/journal/snapshot/cassandra/SnapshotCassandra.scala
Show resolved
Hide resolved
f14d9ac
to
09c9161
Compare
81de6f3
to
3dcd233
Compare
ae02bdc
to
4337c0a
Compare
d9579f6
to
6c4f376
Compare
...src/main/scala/com/evolutiongaming/kafka/journal/snapshot/cassandra/SnapshotStatements.scala
Outdated
Show resolved
Hide resolved
This reverts commit 9aea71a.
# Conflicts: # eventual-cassandra/src/test/scala/com/evolutiongaming/kafka/journal/eventual/cassandra/CreateSchemaSpec.scala
# Conflicts: # eventual-cassandra/src/main/scala/com/evolutiongaming/kafka/journal/eventual/cassandra/CreateSchema.scala # eventual-cassandra/src/test/scala/com/evolutiongaming/kafka/journal/eventual/cassandra/CreateSchemaSpec.scala # Conflicts: # build.sbt # Conflicts: # build.sbt
It was inherited from original `Event` code, but it seems that it was only needed there to ensure backwards compatibility when introducing JSON payloads and no longer necessary. See c6d0543 for more details.
The reason behind this is that there won't be a flat counterpart of the store. It will be the only one.
It is kind of a draft for now, but any feedback is welcome.
The Idea
The big idea is to get rid of
sequence_nr
column from a primary key. The presence of this column in Akka Persistence Cassandra causes creation of a tombstone each time the snapshot is deleted and the tombstone stays there untilgc_grace_seconds
is reached. This causes slower reads and may limit the ability to scan the table whenever the incident happen.The new implementation has a fixed maximum number of rows per one
pesistenceId
and updates them in-place using a ring buffer approach.Why part of Kafka Journal, not a separate library?
The goal is to reuse Kafka Journal infrastructure:
persistenceId
to(id, topic)
pair and back,eventual-cassandra
.How to review?
In this specific PR there are only two useful / added-value classes:
SnapshotStatements
- defines the new table used to store snapshots, and CQL statements to get/update/delete them,SnapshotCassandra
- defines the actual ring buffer logic to insert/update/delete snapshots from the new table.You might also find
SnapshotCassandraTest
useful as it defines uses cases I was thinking about when writingSnapshotCassandra
, andSnapshotPerfSpec
which demonstrates that, indeed, a new plugin is faster when reading, and slower when writing snapshots.Everything else are just wrappers around Akka SnapshotStore API without much added-value code.
Why it is a draft and not ready for merge?
I do not want to break the existing
kafka-journal
, obviously, so I tried to change as little as possible in a journal plugin, but I am not sure I fully succeeded. I.e. I had to squeeze in the code intoCreateSchema
andSchemaConfig
classes, which creates a new snapshot table if it does not exist.I am not sure it is a right thing to do, because most people won't use a new snapshotter for a long time.
Besides that, I was thinking that to sell something like this to the users, I should not increase complexity of
kafka-journal
, hence separating the code fromkafka-journal
into separate subproject / documenting everything well could be essential.And the last, but not the least, I kind of hacked serialization part in, and it still needs to be checked / worked on.
The biggest obstacle is how to achieve these goals without making the changelist huge and unreviewable. I am playing with separating the code here (it is PR to PR), but I am not completely sure I am going to a right direction: https://github.com/evolution-gaming/kafka-journal/pull/537/files