Skip to content

Latest commit

 

History

History
211 lines (174 loc) · 8.92 KB

blobs.md

File metadata and controls

211 lines (174 loc) · 8.92 KB

Blobs

In the pack files chapter, we learned that pack files have one or more tree or data encrypted blobs that contain information about our filesystem (tree blobs) or raw data (data blobs) from our backed up files. Each backed up file will result in one or more tree blobs and one or more data blobs added to one of these pack files, that we can later read and decrypt to access the original content.

It was demonstrated how we can backup a big MP3 file and use some code to list the blobs in the pack files added to the restic repository, then restore that MP3 using the low level restic cat sub-command.

We also discovered that blob 0480151d0705d3a9741ee904d5b2219ef465b03ccd33bf77097e28eec9ae1b69 was the tree blob that had the information about our MP3 file. When the Monplaisir_-_04_-_Stage_1_Level_24.mp3 was backed up, restic split the file in 7 different variable length data blobs, encrypted them and added them to different pack files:

restic cat blob 0480151d0705d3a9741ee904d5b2219ef465b03ccd33bf77097e28eec9ae1b69 | jq

was used to print the MP3 tree blob that contains all the information required to reconstruct the original file:

{
  "nodes": [
    {
      "name": "Monplaisir_-_04_-_Stage_1_Level_24.mp3",
      "type": "file",
      "mode": 436,
      "mtime": "2020-10-01T18:54:25.822030931+02:00",
      "atime": "2020-10-01T18:54:25.822030931+02:00",
      "ctime": "2020-10-01T18:54:52.534077041+02:00",
      "uid": 1000,
      "gid": 1000,
      "user": "rubiojr",
      "group": "rubiojr",
      "inode": 13317865,
      "device_id": 64769,
      "size": 12879358,
      "links": 1,
      "content": [
        "21c11cc8c5fa5607f2311e0d9b5ef6798faf48c6a11772ca430122cae3e13b0a",
        "64c6b4964a0b01b6e11f1129e2071fe0093480b636fe9b63138a1fb1c5c613d4",
        "68692441140ede9315df14ed9973c096288766e548a9b6c03acd8a9d32991d6e",
        "07f659f23cf20404515e598d2c9f9d4aab0cc909993474561d96e94835abc321",
        "db646f6b5566801180cb310f6abcc4b417cc9d51a449748849e44f084350968e",
        "81e868bbc0beefc29754be3f5495c4ba2e194f9ab7c203a3a7a3ac6ca2101510",
        "af3e4cd790c5d77026bacfb0abecd6306f1fd978c97a877e13521a8e5a4c3ded"
      ]
    }
  ]
}

The content array is an ordered list of encrypted blobs that form the Monplaisir_-_04_-_Stage_1_Level_24.mp3 file, meaning that if we iterate that list sequentially, read each blob from the pack that contains it, decrypt it and write the resulting plaintext to a file in order, we'll get our MP3 back. That's what we did with this shell one-liner:

restic cat blob 0480151d0705d3a9741ee904d5b2219ef465b03ccd33bf77097e28eec9ae1b69 | \
       jq -r '.nodes | .[0].content[]' | \
       xargs -I{} restic cat blob {} >> /tmp/restored.mp3

We'll now do the same, but using our own code:

go run examples/blobs.go 0480151d0705d3a9741ee904d5b2219ef465b03ccd33bf77097e28eec9ae1b69

MP3 tree blob for Monplaisir_-_04_-_Stage_1_Level_24.mp3 found and loaded
Data blob 21c11cc8 found, decrypting and writting it to /tmp/restored-mp3.mp3
Data blob 64c6b496 found, decrypting and writting it to /tmp/restored-mp3.mp3
Data blob 68692441 found, decrypting and writting it to /tmp/restored-mp3.mp3
Data blob 07f659f2 found, decrypting and writting it to /tmp/restored-mp3.mp3
Data blob db646f6b found, decrypting and writting it to /tmp/restored-mp3.mp3
Data blob 81e868bb found, decrypting and writting it to /tmp/restored-mp3.mp3
Data blob af3e4cd7 found, decrypting and writting it to /tmp/restored-mp3.mp3
Restored MP3 SHA256: 01d4bac715e7cc70193fdf70db3c5022d0dd5f33dacd6d4a07a2747258416338
Orinal MP3 SHA256:   01d4bac715e7cc70193fdf70db3c5022d0dd5f33dacd6d4a07a2747258416338
File was restored successfully!

Given that we know from the pack files chapter that 0480151d0705d3a9741ee904d5b2219ef465b03ccd33bf77097e28eec9ae1b69 is the tree blob ID that contains the MP3 file metadata, that'll be enough information to read and decrypt all the data blobs that form the MP3 file content.

First, we want to index all the blobs in every pack file available in the repository. We need to find 7 different data blobs stored in different pack files, so the index will speed things up. Much more complex indexing is also part of restic's source code, we'll talk about that in the index chapter.

	// Use a map to index all the blobs so we can easily find
	// which pack contains them later
	indexBlobs(util.RepoPath, k)

For simplicity, we're using a map to store the blob ID as a key and the pack ID where it's been stored as the value.

indexBlobs simply walks the filesystem (our repository) and for every pack file found, lists the blobs in that pack file and adds them to our index:

// walk restic's repository data dir and index all the pack files found
func indexBlobs(repoPath string, k *crypto.Key) {
	dataDir := filepath.Join(repoPath, "data")
	indexerFunc := func(path string, info os.FileInfo, err error) error {
		if info.IsDir() {
			return nil
		}
		_, packStr := filepath.Split(path)
		indexBlobsInPack(packStr, path, info, k)
		return nil
	}
	err := filepath.Walk(dataDir, indexerFunc)
	util.CheckErr(err)

// Add all the blob IDs found in a pack to the index map
func indexBlobsInPack(packID, path string, info os.FileInfo, k *crypto.Key) {
	handle, err := os.Open(path)
	util.CheckErr(err)
	defer handle.Close()
	blobs, err := pack.List(k, handle, info.Size())

	for _, blob := range blobs {
		blobIndex[blob.ID] = packID
	}
}

Once we have our simple index, we can easily retrieve the pack ID of the pack that contains the MP3 tree blob and read it from the pack file.

	// Find the pack that contains our mp3 tree blob that describes the
	// mp3 file attributes and the content blobs
	mp3PackID, err := restic.ParseID(blobIndex[treeBlobID])
	util.CheckErr(err)
	mp3Tree := loadTreeBlob(util.RepoPath, mp3PackID, treeBlobID, k)
	fmt.Printf("MP3 tree blob for %s found and loaded\n", mp3Tree.Nodes[0].Name)

loadTreeBlob reads the tree blob from the pack file and decrypts its content, to get the JSON metadata that describes the MP3 file (as shown earlier in this chapter). The JSON is then unmarshalled to get a restic.Tree instance:

// decrypts a tree blob and creates a Tree struct instance
func loadTreeBlob(repoPath string, packID, treeBlobID restic.ID, k *crypto.Key) *restic.Tree {
	found := fetchBlob(repoPath, packID, treeBlobID, k)
	bc := blobContent(repoPath, found, k)
	tree := &restic.Tree{}
	err := json.Unmarshal(bc, tree)
	util.CheckErr(err)

	return tree
}

fetchBlob simply reads the encrypted blob from the pack file, so we can decrypt it later.

// returns an encrypted restic.Blob instance from a pack
func fetchBlob(repoPath string, packID, blobID restic.ID, k *crypto.Key) *restic.PackedBlob {
	fullPath := filepath.Join(repoPath, "data", packID.DirectoryPrefix(), packID.String())
	handle, err := os.Open(fullPath)
	util.CheckErr(err)
	defer handle.Close()

	info, err := os.Stat(fullPath)
	util.CheckErr(err)

	blobs, err := pack.List(k, handle, info.Size())
	util.CheckErr(err)

	for _, blob := range blobs {
		if blob.ID.Equal(blobID) {
			pb := restic.PackedBlob{Blob: blob, PackID: packID}
			return &pb
		}
	}

	return nil
}

Once we have the tree blob instance (mp3Tree), we have access to the ordered list of data blobs that form the mp3 file, so we can iterate over that list, decrypt the data blobs, and write the plaintext to the destination file, in order:

	// The restored mp3 file will be saved here
	restoredF := "/tmp/restored-mp3.mp3"
	restoredMP3, err := os.Create(restoredF)
	util.CheckErr(err)
	defer restoredMP3.Close()

	// Find all the data blobs that from the mp3 file, decrypt them and write
	// them to the destination file
	for _, cBlob := range mp3Tree.Nodes[0].Content {
		p := blobIndex[cBlob]
		packID, err := restic.ParseID(p)
		util.CheckErr(err)
		found := fetchBlob(util.RepoPath, packID, cBlob, k)
		fmt.Printf("Data blob %s found, decrypting and writting it to %s\n", found.ID.Str(), restoredF)
		content := blobContent(util.RepoPath, found, k)
		restoredMP3.Write(content)
	}

Now that we have the MP3 file restored, we can optionally double check the SHA256 of the new file matches the original MP3 SHA256:

	// Make sure the restored MP3 file SHA256 matches the original's
	// sha256sum examples/data/examples/data/Monplaisir_-_04_-_Stage_1_Level_24.mp3
	buf, err := ioutil.ReadFile(restoredF)
	util.CheckErr(err)
	sum := sha256.Sum256(buf)

	ssum := fmt.Sprintf("%x", sum)
	fmt.Printf("Restored MP3 SHA256: %s\n", ssum)
	fmt.Printf("Orinal MP3 SHA256:   %s\n", util.MP3SHA256)
	if ssum == util.MP3SHA256 {
		fmt.Println("File was restored successfully!")
	} else {
		fmt.Println("Restored MP3 file is invalid")
	}

The full working example can be found in examples/blobs.go.