-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Table model data deletion #13878
base: master
Are you sure you want to change the base?
Table model data deletion #13878
Conversation
# Conflicts: # iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/analyze/LoadTsFileAnalyzer.java
# Conflicts: # pom.xml
…ble_data_deletion
…ble_data_deletion # Conflicts: # iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/dataregion/modification/IDPredicate.java
# Conflicts: # iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/load/LoadTsFileManager.java
# Conflicts: # iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/dataregion/compaction/execute/task/InsertionCrossSpaceCompactionTask.java
# Conflicts: # iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/dataregion/compaction/execute/utils/CompactionUtils.java
# Conflicts: # iotdb-core/datanode/src/main/java/org/apache/iotdb/db/schemaengine/schemaregion/mtree/impl/mem/MTreeBelowSGMemoryImpl.java
* Update IoTDBTableIT.java * Update DataNodeInternalRPCServiceImpl.java
// TODO: implement | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throw Exception may be better? @Caideyipi @SteveYurongSu can plan when to support this.
...re/datanode/src/main/java/org/apache/iotdb/db/storageengine/dataregion/wal/node/WALNode.java
Outdated
Show resolved
Hide resolved
if (tsFileResource.getStartTime(device) < timeLowerBoundForCurrentDevice) { | ||
ttlDeletion = | ||
new Deletion( | ||
new TreeDeletionEntry( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should distinguish between tree models and table models for current device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -194,8 +195,9 @@ public File getFileFromDataDirsIfAnyAdjuvantFileExists() { | |||
File file = FSFactoryProducer.getFSFactory().getFile(dataDir, partialFileString); | |||
if (file.exists() | |||
|| new File(file.getAbsolutePath() + TsFileResource.RESOURCE_SUFFIX).exists() | |||
|| new File(file.getAbsolutePath() + ModificationFile.FILE_SUFFIX).exists() | |||
|| new File(file.getAbsolutePath() + ModificationFile.COMPACTION_FILE_SUFFIX).exists()) { | |||
|| ModificationFileV1.getNormalMods(file).exists() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also check the compaction mods file v1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if (!checkAndDeleteFile(file)) { | ||
File resourceFile = | ||
getFileFromDataDirs(tsFileIdentifier.getFilePath() + TsFileResource.RESOURCE_SUFFIX); | ||
if (!checkAndDeleteFile(resourceFile)) { | ||
success = false; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also check old version mods files and compaction mods files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -263,9 +263,9 @@ public void testCompactionWithAllDeletion() throws IOException, IllegalPathExcep | |||
writer.endFile(); | |||
} | |||
resource1 | |||
.getModFile() | |||
.write(new Deletion(new MeasurementPath(deviceID, ""), Long.MAX_VALUE, Long.MAX_VALUE)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May use TableDeletionEntry because the device is table model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
* drop column * Update IoTDBTableIT.java * Update IoTDBTableIT.java
…ble_data_deletion
# Conflicts: # iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/dataregion/memtable/TsFileProcessor.java
* adaptation * Fix * Update DeleteDevice.java * Update AnalyzeUtils.java * Update IoTDBDeviceIT.java
@@ -129,10 +129,9 @@ private static void moveSeqResourceToUnsequenceDir(TsFileResource resource) thro | |||
moveFile( | |||
new File(tsfile.getAbsolutePath() + TsFileResource.RESOURCE_SUFFIX), | |||
new File(targetFile.getAbsolutePath() + TsFileResource.RESOURCE_SUFFIX)); | |||
if (resource.modFileExists()) { | |||
if (resource.anyModFileExists()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If only shared mods file exists, it will also enter the following move method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -143,18 +143,17 @@ public void perform() throws Exception { | |||
x -> x.definitelyNotContains(device) || !x.isDeviceAlive(device, ttl)); | |||
sortedSourceFiles.sort(Comparator.comparingLong(x -> x.getStartTime(device))); | |||
if (ttl != Long.MAX_VALUE) { | |||
Deletion ttlDeletion = | |||
new Deletion( | |||
TreeDeletionEntry ttlDeletion = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should distinguish between tree models and table models for current device.
Quality Gate failedFailed conditions See analysis details on SonarQube Cloud Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE |
@@ -189,20 +200,17 @@ public static void addFilesToFileMetrics(TsFileResource resource) { | |||
resource.getTsFile().length(), | |||
resource.isSeq(), | |||
resource.getTsFile().getName()); | |||
if (resource.modFileExists()) { | |||
if (resource.anyModFileExists()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the shared mods file, is this redundant addition here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
TsFileResource firstSource = filesView.sourceFilesInLog.get(0); | ||
TsFileResource firstTarget = filesView.targetFilesInPerformer.get(0); | ||
ModFileManagement modFileManagement = firstSource.getModFileManagement(); | ||
ModificationFile modificationFile = modFileManagement.allocateFor(firstTarget); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a shared mods file has been created here and the resource file has not yet been serialized, can it be found during restart recovery?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if the resource file is not serialized, then the compaction is not completed, and the target file will be removed eventually.
So, it does not matter whether the shared mod file can be found in the recovery.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the shared mod file can not be found in the recovery and the compaction task will rollback in the system recovery(kill -9 before the target resource been serialized), at this point, the compaction recover task may not be able to locate this shared mods file.
private volatile ModificationFile exclusiveModFile; | ||
|
||
private volatile ModificationFile sharedModFile; | ||
private long shardModFileOffset; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shard -> shared
@@ -206,10 +206,10 @@ private boolean createSnapshot(List<TsFileResource> resources, String snapshotId | |||
createHardLink( | |||
new File(snapshotTsFile.getAbsolutePath() + TsFileResource.RESOURCE_SUFFIX), | |||
new File(tsFile.getAbsolutePath() + TsFileResource.RESOURCE_SUFFIX)); | |||
if (resource.getModFile().exists()) { | |||
if (resource.exclusiveModFileExists()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also copy the shared mods files if it does not exist in snapshot dir.
Grammar definition:
Examples:
Assumed table schema in examples:
vehicle1(deviceType STRING ID, deviceId STRING ID, s0 INT32 MEASUREMENT, s1 INT64 MEASUREMENT)
Supported examples:
Unsupported examples:
Evaluation
Notice: compaction is disabled in all tests to avoid files being removed.
Write performance
We evaluate the time consumption of a single deletion over different numbers of files involved.
The experiment is performed by writing x TsFiles on a 1C1D instance
and then performing a deletion on all of them 5 times,
recording the average time consumption.
Below are the results
"Before" refers to the master branch (using the tree model),
while "after" is this branch (using the table model).
The results show that the time consumption is linear to the number of files, which is expected.
The parallel deletion optimization reduces the time consumption to around 1/3.
It is noticeable that both lines go up suddenly after a threshold,
which is about 2400 for "after" and 3800 for "before".
It is most possible that the cache of the CPU or the disk is overflowed and the IO efficiency is thus reduced.
Read performance
We also experiment on the effects of deletions on queries.
In this test, we write 100 TsFiles, each only with 100 points of 1 time series.
Then, after writing x deletions, each like
DELETE FROM test.table1 WHERE deviceId = 'd0',
we perfom 10 queries like
SELECT * FROM test.table1
and record the average query latency.
The following picture gives the results:
Similarly, "Before" refers to the master branch (using the tree model),
while "after" is this branch (using the table model).
As expected, the more deletions are written the slower queries will be,
and the query latency is about linear to the number of deletions.
Compared with "Before", the "After" line may reduce the query latency by about 40%,
due to the more compact binary format.