Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table model data deletion #13878

Open
wants to merge 65 commits into
base: master
Choose a base branch
from
Open

Table model data deletion #13878

wants to merge 65 commits into from

Conversation

jt2594838
Copy link
Contributor

@jt2594838 jt2594838 commented Oct 23, 2024

  1. Upgrade the format of the Modification File to V2 (with the suffix ".mods2") to support table model data deletion;
  2. Refactor memory structures to support table model data deletion;
  3. During start-up, V1 Modification Files will be upgraded to V2 asynchronously.
  4. Deletions to different Modification Files are performed in parallel.
  5. Further abstractions are added to the mod file management to make future extensions easier.

Grammar definition:

image

Examples:

Assumed table schema in examples:
vehicle1(deviceType STRING ID, deviceId STRING ID, s0 INT32 MEASUREMENT, s1 INT64 MEASUREMENT)

Supported examples:

image

Unsupported examples:

image

Evaluation

Notice: compaction is disabled in all tests to avoid files being removed.

Write performance

We evaluate the time consumption of a single deletion over different numbers of files involved.
The experiment is performed by writing x TsFiles on a 1C1D instance
and then performing a deletion on all of them 5 times,
recording the average time consumption.

Below are the results
image
"Before" refers to the master branch (using the tree model),
while "after" is this branch (using the table model).

The results show that the time consumption is linear to the number of files, which is expected.
The parallel deletion optimization reduces the time consumption to around 1/3.
It is noticeable that both lines go up suddenly after a threshold,
which is about 2400 for "after" and 3800 for "before".
It is most possible that the cache of the CPU or the disk is overflowed and the IO efficiency is thus reduced.

Read performance

We also experiment on the effects of deletions on queries.
In this test, we write 100 TsFiles, each only with 100 points of 1 time series.
Then, after writing x deletions, each like
DELETE FROM test.table1 WHERE deviceId = 'd0',
we perfom 10 queries like
SELECT * FROM test.table1
and record the average query latency.

The following picture gives the results:
image
Similarly, "Before" refers to the master branch (using the tree model),
while "after" is this branch (using the table model).

As expected, the more deletions are written the slower queries will be,
and the query latency is about linear to the number of deletions.
Compared with "Before", the "After" line may reduce the query latency by about 40%,
due to the more compact binary format.

jt2594838 and others added 30 commits October 12, 2024 11:27
# Conflicts:
#	iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/analyze/LoadTsFileAnalyzer.java
…ble_data_deletion

# Conflicts:
#	iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/dataregion/modification/IDPredicate.java
# Conflicts:
#	iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/load/LoadTsFileManager.java
# Conflicts:
#	iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/dataregion/compaction/execute/task/InsertionCrossSpaceCompactionTask.java
# Conflicts:
#	iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/dataregion/compaction/execute/utils/CompactionUtils.java
# Conflicts:
#	iotdb-core/datanode/src/main/java/org/apache/iotdb/db/schemaengine/schemaregion/mtree/impl/mem/MTreeBelowSGMemoryImpl.java
* Update IoTDBTableIT.java

* Update DataNodeInternalRPCServiceImpl.java
@jt2594838 jt2594838 marked this pull request as ready for review November 11, 2024 02:04
Comment on lines +174 to +175
// TODO: implement
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw Exception may be better? @Caideyipi @SteveYurongSu can plan when to support this.

if (tsFileResource.getStartTime(device) < timeLowerBoundForCurrentDevice) {
ttlDeletion =
new Deletion(
new TreeDeletionEntry(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should distinguish between tree models and table models for current device.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -194,8 +195,9 @@ public File getFileFromDataDirsIfAnyAdjuvantFileExists() {
File file = FSFactoryProducer.getFSFactory().getFile(dataDir, partialFileString);
if (file.exists()
|| new File(file.getAbsolutePath() + TsFileResource.RESOURCE_SUFFIX).exists()
|| new File(file.getAbsolutePath() + ModificationFile.FILE_SUFFIX).exists()
|| new File(file.getAbsolutePath() + ModificationFile.COMPACTION_FILE_SUFFIX).exists()) {
|| ModificationFileV1.getNormalMods(file).exists()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also check the compaction mods file v1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if (!checkAndDeleteFile(file)) {
File resourceFile =
getFileFromDataDirs(tsFileIdentifier.getFilePath() + TsFileResource.RESOURCE_SUFFIX);
if (!checkAndDeleteFile(resourceFile)) {
success = false;
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also check old version mods files and compaction mods files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -263,9 +263,9 @@ public void testCompactionWithAllDeletion() throws IOException, IllegalPathExcep
writer.endFile();
}
resource1
.getModFile()
.write(new Deletion(new MeasurementPath(deviceID, ""), Long.MAX_VALUE, Long.MAX_VALUE));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May use TableDeletionEntry because the device is table model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Caideyipi and others added 8 commits November 13, 2024 14:55
* drop column

* Update IoTDBTableIT.java

* Update IoTDBTableIT.java
# Conflicts:
#	iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/dataregion/memtable/TsFileProcessor.java
* adaptation

* Fix

* Update DeleteDevice.java

* Update AnalyzeUtils.java

* Update IoTDBDeviceIT.java
@@ -129,10 +129,9 @@ private static void moveSeqResourceToUnsequenceDir(TsFileResource resource) thro
moveFile(
new File(tsfile.getAbsolutePath() + TsFileResource.RESOURCE_SUFFIX),
new File(targetFile.getAbsolutePath() + TsFileResource.RESOURCE_SUFFIX));
if (resource.modFileExists()) {
if (resource.anyModFileExists()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If only shared mods file exists, it will also enter the following move method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -143,18 +143,17 @@ public void perform() throws Exception {
x -> x.definitelyNotContains(device) || !x.isDeviceAlive(device, ttl));
sortedSourceFiles.sort(Comparator.comparingLong(x -> x.getStartTime(device)));
if (ttl != Long.MAX_VALUE) {
Deletion ttlDeletion =
new Deletion(
TreeDeletionEntry ttlDeletion =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should distinguish between tree models and table models for current device.

Copy link

sonarcloud bot commented Nov 15, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
D Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

@@ -189,20 +200,17 @@ public static void addFilesToFileMetrics(TsFileResource resource) {
resource.getTsFile().length(),
resource.isSeq(),
resource.getTsFile().getName());
if (resource.modFileExists()) {
if (resource.anyModFileExists()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the shared mods file, is this redundant addition here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

TsFileResource firstSource = filesView.sourceFilesInLog.get(0);
TsFileResource firstTarget = filesView.targetFilesInPerformer.get(0);
ModFileManagement modFileManagement = firstSource.getModFileManagement();
ModificationFile modificationFile = modFileManagement.allocateFor(firstTarget);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a shared mods file has been created here and the resource file has not yet been serialized, can it be found during restart recovery?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if the resource file is not serialized, then the compaction is not completed, and the target file will be removed eventually.
So, it does not matter whether the shared mod file can be found in the recovery.

Copy link
Collaborator

@shuwenwei shuwenwei Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the shared mod file can not be found in the recovery and the compaction task will rollback in the system recovery(kill -9 before the target resource been serialized), at this point, the compaction recover task may not be able to locate this shared mods file.

private volatile ModificationFile exclusiveModFile;

private volatile ModificationFile sharedModFile;
private long shardModFileOffset;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shard -> shared

@@ -206,10 +206,10 @@ private boolean createSnapshot(List<TsFileResource> resources, String snapshotId
createHardLink(
new File(snapshotTsFile.getAbsolutePath() + TsFileResource.RESOURCE_SUFFIX),
new File(tsFile.getAbsolutePath() + TsFileResource.RESOURCE_SUFFIX));
if (resource.getModFile().exists()) {
if (resource.exclusiveModFileExists()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also copy the shared mods files if it does not exist in snapshot dir.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants