Skip to content
Reid Draper edited this page Nov 16, 2011 · 29 revisions

Large Files

This page will document our ideas and questions for large-file support.

Implementation Pseudocode

Writes

  1. Inspect request Content-Length header, determine if it's even worth chunking the request. If so, move to step 2, otherwise perform a "normal" put.

  2. Create a UUID. This UUID will be used to namespace blocks from concurrent updates. For example, we don't want a namespace collision between bock 0 of a request/PUT that is in progress to trample over existing data.

  3. Create a metadata/manifest object. This object will have several fields, and be updated as the file is chunked up and written to Riak. The fields are (tentatively):

    uuid
    bucket
    key
    content-length
    time created
    time finished # if complete
    blocks remaining # a set of the blocks to-be-written to Riak
    

The object will be written to the same {Bucket, Key} that the object would. It is expected that there will be siblings occasionally created a this key. Sometimes the siblings will share a UUID and be merged (via a deterministic algo) and sometimes they will have different UUIDs. Depending on the circumstance, we may purposely keep siblings with different UUIDs around (ex. while doing GC on an old version of an object). Reads will be done with the most recent "completed" manifest object.

  1. Webmachine will begin reading in chunks of the PUT request, which a coordinator will write to Riak and update the manifest object accordingly when each (or perhaps in batches) chunk has been written. If the coordinator process dies, the request fails.

Questions

  • What should the block size be?

Longer-Term Ideas

  • Rack awareness
  • larger block size
  • multiple-disk support