For distributed systems, the storage hierarchy extends to rack, cluster, and datacenter levels.
- The bandwidth is limited by the network, so the bandwidth of memory and disk is the same.
- The memory latency changes relatively large, and the disk latency itself has a relatively large base, so the change is not obvious.
Dirstibuted systems are a must.
Increase Scalability
Decrease Latency
- more interactives
- data is stored distributedly
Hardware
- price doesn't go linerly
- single node failure
E.g.: BigTable System
How does the paper advanced the state of the art?
Bigtable's improvement over the original paper:
Improved performance isolation
Improved protection against corruption
- GFS
- Making Applications Robust Against Failures
- Canary requests
- Backend detection
- More aggressive load balancing when imbalance is more severe
- Better to give users limited functionality than an error page
- Add Sufficient Monitoring/Status/Debugging Hooks
- Export HTML-based status pages for easy diagnosis
- Export a collection of key-value pairs via a standard interface
- Support low-overhead online profiling
- RPC subsystem collects sample of all requests
- BigTable
- Replication
- Coprocessors
Jeff gives a view of how to build large distributed systems in multi-aspects. Hardware, Software, Reliability, Application Robustness, System Monitoring, the file system and how the data are stored all matters when designing a large distributed system.
Not mentioned
E.g: Design Goals for Spanner
– zones of semi-autonomous control
– consistency after disconnected operation
– users specify high-level desires
General model of consistency choices, explained and codified
Easy-to-use abstractions for resolving conflicting updates to multiple versions of a piece of state