Releases: CentaurusInfra/global-resource-service
Releases · CentaurusInfra/global-resource-service
Release v0.2.0
This release focuses on scaling up Global Resource Service (GRS) to support total 5 million nodes come from different regions.
Highlights include:
- In regular node status change scenario, GRS delivers 99% of node status changes to consumers (schedulers) within 90 milliseconds
- In massive node outage scenario (5M nodes have status changes), GRS can deliver all events to consumers within 1 minute
- Latency and throughput data:
Test Case | Watch Latency (ms) | List (ms) | Register (ms) | Throughput (events/s) |
||
P50 | P90 | P99 | ||||
Regular changes | 32 | 73 | 88 | 1,389 | 64 | N/A |
Massive outages | 23,606 | 44,900 | 47,810 | 1,381 | 66 | 96,241 |
Detailed test configurations are listed here.
Features/Improvements/Bug fixes:
Architectural Changes:
- Use list/watch mechanism to replace the pull model between data aggregator and Global Resource Service API in favor of latency in regular node status change scenario (PR 196)
Features and Engineering Improvements:
- Add Admin APIs: support single node status query in Global Resource Service (PR 147, 161)
- Add integration test framework and test cases for CICD pipeline support (PR 195)
Scalability and Performance Tuning:
- Support profiling for Global Resource Service (PR 114)
Global Resource Service v0.1.0
Release Summary
This is the first release of the Global Resource Service, one of the corner stones for the Regionless Cloud Platform.
The 0.1.0 release includes the following components:
- Global Resource Service API server, that supports REST APIs for client registration, List Assigned nodes and Watch for node changes.
- Performant distributor, event queues and cache to support large scale of node changes
- Data Aggregator that collects nodes and node changes from each region
- Client development SDK that provides APIs for building scheduler or other clients to the Global Resource Service
- A Region manager simulator that provides region level, multiple Resource Provider simulation of data changes
- A simulated scheduler with the cache layer
- A test infrastructure to automate service deployment, cross region test setup, test execution and result collection
Key Features:
- Client registration, Node List and Watch APIs
- Distributor algorithms to support multiple region and resource clusters
- Distributor algorithms for efficient, balanced node resource distribution to schedulers
- Scalability: Scale up to 1m nodes cross multiple regions, with up to 40 schedulers
- Performance: End to end latency just 300ms for normal node failures cases (Daily change pattern) and within 1.3 seconds for disaster scenarios(RP outage pattern)
- Abstraction of node resources, aka, logical min. record for node resource
- Abstraction of resource version, aka, Composite RV ( or CRV ) from nodes from different and global origins.
- Cross region data change simulation of both "Daily" and "RP outage" test scenarios
- Automatic test environment setup, test execution and result collection routines.
Please refer to the release note for details