Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

task delete should also purge #3644

Open
ksandvik opened this issue Oct 11, 2024 · 13 comments
Open

task delete should also purge #3644

ksandvik opened this issue Oct 11, 2024 · 13 comments

Comments

@ksandvik
Copy link

To request a feature...

  • Clearly describe the feature.
    task delete should have an option or a flag to purge at the same time

  • Clearly state the use case. We are only interested in use cases, do not waste time with implementation details or suggested syntax.

In 99 of 100 cases I want to purge a task at the same time as I delete it. Just now I don't actually have a single work flow where I need to keep a task around after it has been deleted. There's no need to also make this verbal with yes/no responses.

@djmitche
Copy link
Collaborator

Huh, I didn't realize task purge wouldn't purge a non-deleted task. Maybe we could add an option to it somehow? I'd suggest -f but none of the other Taskwarrior commands take - style flags. Or, maybe that could be a config option?

There is an option to automatically purge deleted tasks after a period of time -- is that sufficient for you? Why do they need to be purged?

@ksandvik
Copy link
Author

As mentioned, I have yet to find a case in my taskwarrior workflow where I don't both delete and purge an unnecessary task, doing this all the time. Now I could do do task delete; task purge but then I'm getting all the prompts for asking if I really want to do it (which I want).

I agree that task should never have option flags, rather key words, maybe
task delete permanently or task delete now or task delete purge or something similar.

@djmitche
Copy link
Collaborator

Why do you need to purge and not just delete tasks?

@smemsh
Copy link
Contributor

smemsh commented Oct 13, 2024

I almost always purge after deleting also, but there are some use cases where I wouldn't, such as when doing infrequent "maintenance" or trimming of my task database, I probably wouldn't purge until the end of making such a pass, once I was satisfied I hadn't deleted anything by mistake.

Purge is useful is so they are removed from the database entirely, and don't show up at all anymore, for example for test/junk tasks that should end up like they never existed. There are some reasons enumerated in #3399 and I was very happy that purge was restored in 3.x, as this was a barrier to upgrade for me.

However I would not want purge to happen automatically, just like I don't want gc to happen automatically; it is better off explicit, so I hope we can have this only be an option (if implemented) and not change it for everyone. The idea of making purge automatically force a deletion first might make sense though...

(Also, not sure if still the case with the TaskChampion backend, but note that 2b88260 seems to imply that gc is done automatically upon purge, and that's another reason not to want to auto-purge when deleting, because it would forcibly change the ID numbers.)

@djmitche
Copy link
Collaborator

Deleting test tasks makes sense. But in general I'd recommend setting up your filters, etc. so that deleted tasks don't show up. Then if one is deleted by accident, it's easy to recover it.

Note that weird things will happen if a task is purged and modified on another replica. Imagine task 1444db41 "take out the trash":

  • Laptop: task 1444db41 delete; task 1444db41 purge
  • Desktop: task 1444db41 done
  • Laptop: task sync
  • Desktop: task sync
  • Laptop: task sync

At the end of this sequence, both replicas will agree that there is a completed task with no description or any other identifying information.

This isn't an issue if there's no sync going on, and shouldn't be a problem for test tasks that aren't modified on multiple replicas. And even if it does happen, it's not going to cause any great harm -- it's just a weird situation [1]. But, this is why I make the recommendation above.

Anyway, optionally allowing task purge to delete tasks before purging them sounds like a reasonable change to me.

[1] Actually, this might cause errors in Taskwarrior right now, but that's a bug -- I'll work on fixing those up.

@smemsh
Copy link
Contributor

smemsh commented Oct 14, 2024

Deleting test tasks makes sense. But in general I'd recommend setting up your filters, etc. so that deleted tasks don't show up. Then if one is deleted by accident, it's easy to recover it.

There are two different states here: deleted, and non-existent. People have different workflows. I've read posts from people that use "deleted" to mark tasks as "not now" or "NAKed" and then undelete them later if they get time/permission to work on them (I don't do this, but I've seen it). This is different from not existing...

To extrapolate from your premise, there's no need to even have a "purge" function at all, but I find it useful to have the two different states. I probably would never want to expire deleted tasks, I would rather purge them explicitly if I wanted them gone. It appears that this is the whole reason purge was added: Paul wanted to get rid of ancient tasks from being visible at all on any report or filter (see #1808). I have also seen a couple other use cases when searching issues: removing all traces of automatically-added recurring tasks, and removing all traces of some aborted project. YMMV...

  • Laptop: task 1444db41 delete; task 1444db41 purge
  • Desktop: task 1444db41 done

I don't know the new sync model; could some kind of conflict-resolution algorithm be used, ie halting sync and forcing manual selection of which truth, or designating a primary replicant that's always right, or earliest/latest update wins? Your scenario would probably only come up with teams working on the same database, or by accident, though, I think?

Actually, your scenario has two different ending states for the task: one where it doesn't exist at all, and one where it exists but is marked completed. (I assume completed tasks are never expired?) That does seem like a fundamental disagreement. Probably if the task exists vs not exists, the existing version should be copied to all the replicants, to be safe?

optionally allowing task purge to delete tasks before purging them sounds like a reasonable change to me.

It probably doesn't need to be optional, since currently it just isn't allowed and a purge would bomb if not yet deleted. I can't see anyone using this guard as a "feature" but rather something they forgot to do (delete it first). And if they deleted without purging, they still want it to be there.

@djmitche
Copy link
Collaborator

I don't know the new sync model; could some kind of conflict-resolution algorithm be used, ie halting sync and forcing manual selection of which truth, or designating a primary replicant that's always right, or earliest/latest update wins? Your scenario would probably only come up with teams working on the same database, or by accident, though, I think?

This is a consequence of the conflict-resolution algorithm. Broadly, deleting things is hard in distributed systems, and typically the solution is what is referred to as "tombstones", which means you just mark an object as deleted and then don't show it. And that is exactly of what the "Deleted' status is!

The scenario would come up with a person syncing the same database between multiple systems. I don't really know how teams would use TaskWarrior -- it's not a common use-case, anyway.

Actually, your scenario has two different ending states for the task: one where it doesn't exist at all, and one where it exists but is marked completed. (I assume completed tasks are never expired?) That does seem like a fundamental disagreement. Probably if the task exists vs not exists, the existing version should be copied to all the replicants, to be safe?

Now that I look at that example again, I think the task will always end up deleted on both replicas. I'd need to think for a bit longer to be sure. The sync does guarantee that all replicas end up in the same state -- it's just a question of what state.

Anyway, I bring up the example just to say that there are downsides to purging tasks. If the convenience of not having deleted tasks in task all output is worth those downsides (or if sync is not in use), then the trade-off makes sense.

@Zocker1999NET
Copy link

I see multiple use cases & problems clashing here. I think some separation & clarification is needed because I think the current discussion mixes a few things. In the following I attempt to analyze and conclude systematically:

(Terms in doubled quotation marks are meant in a common technical sense, not referring to TaskWarrior (TW) terminology.)

There are following requirements I see:

  1. end-user: wants to "cancel" tasks (to hide them, mark as never to-do, …)
  2. end-user: wants to re-review "canceled" tasks (look into history, "uncancel" them, …)
  3. end-user: wants to permanently "delete" data (privacy, security, remove from history, test case, …)
  4. technical user/developer: wants a mostly raw "remove" command, ignoring sync (to fix the database, …)
  5. sync engine: establishes synchrony across a distributed system

Following happens today (as I know & see it):

  • task delete: marks a task with status:deleted
    • sufficient for 1. 2.
    • not sufficient for 3. 4.: task data stays on disk & database
    • compatible with 5.
  • task purge: permanently removes a task from database
    • sufficient for 4.
    • maybe compatible with 5.: all info from task is lost, sync engine has a hard time to deal with that
    • maybe not sufficient for 3.: data is indeed removed from the local disk
      • but "remove action" is probably not synced
      • sync maybe "restores" this task locally by copying it from remotes

To conclude, 3. is mostly impossible as of today, or brings up the downsides as explained before in #3644 (comment). The obvious theoretical solution: a command to "delete" a task, fulfilling 3. (not 4.), while not breaking 5.. Without regard to the current implementation, this can be done by adding a redact "state", which can be synced by itself, but triggers the "removal" of all data despite metadata technically required for the redaction to be stored & synced further (for TW, I assume UUID is enough). As examples, see how Signal or Matrix implement that.

Now clearly my personal opinion:

  • The current term 'delete' in TW is misleading in a potential bad way for personal privacy or security. Users might think a task gets "deleted" even through data of it can still be easily retrieved.
  • Still, ignoring its name, its functionality is useful as it fulfills the requirements 1. 2..
  • I think TW’s current 'delete' is a miscommunicated "cancel" function, esp. when comparing to RFC 5545: iCal, Section 3.8.1.11.: STATUS.

So I propose (possibly in multiple updates):

  • rename 'delete' everywhere to 'cancel' (task cancel <id>, status:cancelled, ...)
    • at least attempt to do that in the UI
  • introduce the said "delete"/"redact"/"purge" command (fulfilling requirement 4.)
  • add a hint to the current task purge, stating that:
    • it may break synchronization (engines), and that data might get deleted
    • that only 'deleted'/"cancelled" tasks can be 'purged'
    • hint to the non-breaking "redact" command (when that is introduced & both are there to coexist)
    • (then this might be called _purge so it looks visibly more internal)

@ksandvik
Copy link
Author

ksandvik commented Jan 17, 2025

Ok sounds like I then need to write a small shell script to both delete and purge a task, wish the prompting would be disabled but I could pipe in characters.

I'm surprised this is not more used. If I delete a task I consider it not be part of any stats so it should be purged as well. I.e. like 'rm -rf'.

@djmitche
Copy link
Collaborator

Thanks for summarizing the use-cases!

Just as a technical point regarding option 3:

end-user: wants to permanently "delete" data (privacy, security, remove from history, test case, …)

Irreversibly deleting data is essentially impossible in the storage model. By analogy, if you accidentally commit something and push it to GitHub, there's no way to delete that -- you just have to accept that it's been publicly disclosed. You can do things to make it less likely to be discovered accidentally (that's task purge in the Taskwarrior world), but if it was a secret key or password, that's not sufficient -- the secret is no longer secret.

Also, note that task purge doesn't actually break anything with sync. It will work perfectly fine except when a task purge is performed on one replica and task modify for the same task on another replica, before the two replicas are synced. In that case, you'll get a funny-looking task missing all properties except those modified by task modify. For someone who has used task purge, this is probably not entirely surprising, and another task purge will make the problem disappear.

So, I think the current options cover 1, 2, 4, and 5 adequately, and 3 is impossible.

I agree that in some sense "cancel' might be a better term than "delete', but there are a lot of people out there typing task delete, and for some uses (such as mine) that's the better meaning. The additional confusion resulting from renaming the command would, IMHO, be worse than the benefit of the new term.

@ryneeverett
Copy link
Collaborator

Irreversibly deleting data is essentially impossible in the storage model. By analogy, if you accidentally commit something and push it to GitHub, there's no way to delete that -- you just have to accept that it's been publicly disclosed. You can do things to make it less likely to be discovered accidentally (that's task purge in the Taskwarrior world), but if it was a secret key or password, that's not sufficient -- the secret is no longer secret.

I'm not sure I follow this. if you purge a task, create a snapshot, and then delete the old versions, where does the data remain?

@djmitche
Copy link
Collaborator

That data may still exist on other replicas; deleting versions is difficult do do without breaking other things (we had a discussion about this a year or so ago; and deleting data from a SQLite database doesn't make any guarantees that the data is not actually still present in free space; and similarly for filesystems.

Note that the use of client-side encryption gets task purge close to the desired functionality in (3), in the sense that the data on the server is not sufficient to recover the incriminating or secret information, unless the adversary also has the client encryption_secret.

Fundamentally, I don't want to build a feature where we make a promise that redacting an incriminating task will prevent authorities from finding the task; or that redacting a task with a secret in it will prevent forensic analysis of the client from revealing that secret. We just can't meet that promise.

@ryneeverett
Copy link
Collaborator

Fair enough. I didn't account for the context of end-user 3's objectives when I read your response and I agree that we needn't try to accommodate that. To clarify: irreversible data deletion is difficult but under consideration (at least for taskchampion-sync-server) -- but this wont be guaranteed or predictable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

No branches or pull requests

5 participants