Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External USB drives possibly interrupting backup by going to sleep #7797

Closed
AlbertGoma opened this issue Sep 30, 2022 · 8 comments
Closed

External USB drives possibly interrupting backup by going to sleep #7797

AlbertGoma opened this issue Sep 30, 2022 · 8 comments
Labels
affects-4.1 This issue affects Qubes OS 4.1. C: core C: usb proxy eol-4.1 Closed because Qubes 4.1 has reached end-of-life (EOL) hardware support P: critical Priority: critical. Between "major" and "blocker" in severity. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.

Comments

@AlbertGoma
Copy link

Qubes OS release

Upgrade from 4.0 to 4.1.1

Brief summary

When doing a backup of relatively large VMs on an external USB drive the motor stops spinning at a certain point and the resulting file contains only a sequential portion of the data.

It's already mentioned it in my comment in issue #7567 as this might be a cause for I/O errors and the fix could probably solve both issues.

Steps to reproduce

  1. Attach an external USB hard drive (in this particular scenario, a 3.5" SATA hard drive on a USB 3.0 dock) with a valid partition table, enough space and a healthy filesystem (in this case GPT and ext4).
  2. Start a non-networked disposable VM and mount the drive's block device sys-usb:sda (not the USB device) on it.
  3. Start the Qubes Backup tool and select around 40 VMs with a few of them having a storage use over 200GiB each, exceeding 900GiB in total. Have some of those GiB filled from /dev/zero and some others from /dev/urandom, just in case.
  4. Uncheck the Compress backup checkbox and click Next.
  5. Set the disposable VM as the Target qube and choose a Backup directory in the external hard drive's filesystem.
  6. Click Next until the backup starts.
  7. Wait until the backup is apparently finished and the hard drive motor has stopped spinning.

Expected behavior

The restored VMs' logical volumes' storage byte count is identical to the original one before starting the backup.

Actual behavior

In the Qubes Backup Restore tool an I/O error popped out and half of the VMs showed 0 bytes of Disk Usage in the Qube Manager.

When doing an emergency recovery all of those 0-byte VMs had an Unexpected EOF error in all of their chunks when decrypting them with scrypt. One of the VMs' chunks were readable until the 490'th.

@AlbertGoma AlbertGoma added P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. labels Sep 30, 2022
@DemiMarie
Copy link

First, Qubes Backup should definitely fail. That means it should indicate an error and return a non-zero exit code. If it does not, that is a bug.

The second is that your hardware might have problems. One possibility is a failing hard drive, but another is that it uses device-managed shingled magnetic recording (SMR). SMR drives can freeze for long periods of time during garbage collection, and this can cause Linux to treat them as failed and disconnect them.

@DemiMarie DemiMarie added P: critical Priority: critical. Between "major" and "blocker" in severity. and removed P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. labels Oct 1, 2022
@andrewdavidwong andrewdavidwong added C: core C: usb proxy needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. hardware support labels Oct 1, 2022
@andrewdavidwong andrewdavidwong added this to the Release 4.1 updates milestone Oct 1, 2022
@AlbertGoma
Copy link
Author

AlbertGoma commented Oct 1, 2022

Regarding the hard drive I looked at the manufacturer's technical specifications sheet and all the 3.5" versions of that category used CMR (which I assume must be an acronym for Conventional Magnetic Recording). It could have been failing but it's supposed to be a high-end one within its 5-year warranty and it hasn't shown any other signs of failure yet. I could scan it for bad sectors if that might be useful.

Today I tried to reproduce the error on the same hard drive using the same USB 3.0 docking station but unfortunately the backup finished and verified successfully. However:

  • My current R4.1.1's sys-usb and the disposable VM where I mounted the drive are both based on the current debian-11 template rather than R4.0's old fedora-32. The kernel version of the disposable VM under both Qubes releases must have been different as well, as it uses PVH virtualization, but I don't remember which was the last version I had in R4.0. As sys-usb uses HVM I understand it uses the latest kernel installed on the template.
  • According to my phone's Clock app's stopwatch, 20 minutes and few seconds after python3 -m qubes.tarwriter started the hard drive's motor stopped, but when scrypt enc - /tmp/randomname/vmXX/private.img.XXX.enc appeared in Dom0's Task Manager the motor started spinning again.
  • I only tried to backup a single AppVM with the following data in the /home/user directory:
-rw-r--r-- 1 user user  50G Oct  1 09:21 urandom.1
-rw-r--r-- 1 user user 100G Oct  1 09:47 urandom.3
-rw-r--r-- 1 user user 200G Oct  1 09:37 zero.2
-rw-r--r-- 1 user user  50G Oct  1 09:49 zero.4
  • The qubes-backup file size is 153,656.37 MiB while the Disk Usage displayed by the Qube Manager is 419,471.36 MiB, therefore sparse zeroes have been left out.

So sleep happens, although not causing any I/O errors under these settings. The old fedora-32 template was saved from the disaster, so maybe I should try again using it for both sys-usb and dispVM in HVM mode and with enough backed up VMs to almost fill the entire drive so it causes multiple sleep events within the same backup session.

@rustybird
Copy link

  • The qubes-backup file size is 153,656.37 MiB while the Disk Usage displayed by the Qube Manager is 419,471.36 MiB, therefore sparse zeroes have been left out.

So sleep happens, although not causing any I/O errors under these settings.

That's normal on LVM.

@ddevz
Copy link

ddevz commented Oct 7, 2022

... I made the risky decision of not verifying the backup's integrity, as it would have required a similar amount of hours ...

While I recommend doing verifies in the future, dont feel too bad about that decision because that the "verify" does not actually seem to verify that the backup happened, meaning that you could have done the verify and gotten the "everything backed up fine", and still had the same problem. (The EOF message implies to me that this would have happened to you) (note: I've just turned the verify problem into it's own issue at #7809 )

@AlbertGoma
Copy link
Author

In case it may be useful, the USB 3.0 dock I used to perform the backup was a Sharkoon QuickPort Combo USB3.0. Both the computer and the dock were plugged into an Uninterruptible Power Suply.

@andrewdavidwong
Copy link
Member

To be clear: This happens only when using the dock; it does not happen when bypassing the dock and plugging the external USB hard drive directly into the computer?

@AlbertGoma
Copy link
Author

This dock's function is to allow using internal drives as external ones. After the failure I kept doing backups on that hard drive but bypassing the dock and plugging the drive directly into the motherboard's SATA port. When I do backups like this the motor never stops and the verify seems to work fine. (However I never dared to restore them on my PC yet. I could install Qubes on another drive and try to restore them there to confirm the verification process didn't give a false success message if that may be useful)

@andrewdavidwong andrewdavidwong added the affects-4.1 This issue affects Qubes OS 4.1. label Aug 8, 2023
@andrewdavidwong andrewdavidwong removed this from the Release 4.1 updates milestone Aug 13, 2023
@andrewdavidwong andrewdavidwong added the eol-4.1 Closed because Qubes 4.1 has reached end-of-life (EOL) label Dec 7, 2024
Copy link

github-actions bot commented Dec 7, 2024

This issue is being closed because:

If anyone believes that this issue should be reopened, please leave a comment saying so.
(For example, if a bug still affects Qubes OS 4.2, then the comment "Affects 4.2" will suffice.)

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 7, 2024
@andrewdavidwong andrewdavidwong removed the needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. label Dec 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-4.1 This issue affects Qubes OS 4.1. C: core C: usb proxy eol-4.1 Closed because Qubes 4.1 has reached end-of-life (EOL) hardware support P: critical Priority: critical. Between "major" and "blocker" in severity. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.
Projects
None yet
Development

No branches or pull requests

5 participants