Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segment gc when ts_cnt > 1 #3935

Open
vagetablechicken opened this issue May 22, 2024 · 2 comments
Open

segment gc when ts_cnt > 1 #3935

vagetablechicken opened this issue May 22, 2024 · 2 comments
Assignees
Labels
bug Something isn't working storage-engine openmldb storage engine. nameserver & tablet

Comments

@vagetablechicken
Copy link
Collaborator

void Segment::ExecuteGc(const std::map<uint32_t, TTLSt>& ttl_st_map, StatisticsInfo* statistics_info) {
if (ttl_st_map.empty()) {
return;
}
if (ts_cnt_ <= 1) {
ExecuteGc(ttl_st_map.begin()->second, statistics_info);
return;
}
bool need_gc = false;
for (const auto& kv : ttl_st_map) {
if (ts_idx_map_.find(kv.first) == ts_idx_map_.end()) {
return;
}
if (kv.second.NeedGc()) {
need_gc = true;
}
}
if (!need_gc) {
return;
}
GcAllType(ttl_st_map, statistics_info);
}

ref https://utqcxc5xn1.feishu.cn/docx/FTbtdV25eoZDkjxODpCc44qhnlc , if we have a table with indexes in same keys but different ts, e.g.

CREATE TABLE talkingdata(
    ip int,app int,device int,os int,channel int,click_time timestamp,attributed_time timestamp,is_attributed int,
    index(key=(ip), ts=click_time, ttl=1s, ttl_type=absolute),
    index(key=(ip), ts=attributed_time),
    index(key=(app,os), ts=click_time)
);

index0 and index1 will in the same segment and ts_cnt_==2, so segment gc will trigger GcAllType, it'll use the wrong expire time.

when ts_cnt_<=1, ExecuteGc will calc expire time:

void Segment::ExecuteGc(const TTLSt& ttl_st, StatisticsInfo* statistics_info) {
uint64_t cur_time = ::baidu::common::timer::get_micros() / 1000;
switch (ttl_st.ttl_type) {
case ::openmldb::storage::TTLType::kAbsoluteTime: {
if (ttl_st.abs_ttl == 0) {
return;
}
uint64_t expire_time = cur_time - ttl_offset_ - ttl_st.abs_ttl;
Gc4TTL(expire_time, statistics_info);
break;
}

But GcAllType won't, it'll use a small time (ttl value, not the expire time, e.g. ttl=1m, time value will be 1970-01-01) to do gc. Normally, no row will be gc cuz row ts > small time, so the data never expire, you can check by show table status.

@vagetablechicken vagetablechicken added bug Something isn't working storage-engine openmldb storage engine. nameserver & tablet labels May 22, 2024
@ricor07
Copy link

ricor07 commented Dec 26, 2024

Hello, I'd like to join the project by working on this. Could you assign me the issue? Thank you

@vagetablechicken
Copy link
Collaborator Author

vagetablechicken commented Jan 2, 2025

Hello, I'd like to join the project by working on this. Could you assign me the issue? Thank you

@ricor07 thanks for being willing to contribute~

@Shouren @aceforeverd is this issue unassigned? Could we let @ricor07 take a look at this and work on a fix?

BTW, let me know if I can help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working storage-engine openmldb storage engine. nameserver & tablet
Projects
None yet
Development

No branches or pull requests

2 participants