Scalability of CLARIAH tools & infrastructure #126
Labels
discussion
This is a discussion point; invitation to discussion
FAIR Distribution & Deployment
FAIR Distribution & Deployment
We had a meeting between WP3 and WP6 today about certain use cases where a
high(er) degree of scalability is needed; specifically the need to invoke certain
processing tasks in parallel so the output can be obtained in a more reasonable
time.
As this is of course a central theme in any large infrastructure, I wanted to
open up this issue to track any progress, solutions and discussion on this,
from a generic perspective.
There are different aspects to the need to scale that we need to distinguish:
n
splits (if feasible of course) and run one process for each.For 1 we need robust software design (and algorithmic design in particular). This is something we need to encourage if the problem can be solved on this level.,
For 3 we need load balancing and container orchestration, which should be handled by the infrastructure and is viable with solutions like kubernetes.
Point 2 is typically addressed in high performance clusters using job schedulers like SLURM or complete workflow management solutions (e.g. DANE, Nextflow, Airflow, Luigi, etc). Solutions like kube-scheduler may also be fitting for our service-oriented architecture.
These three are not mutually exclusive, in real situations there may be demands for all three, also at the same time (which complicates matters)
Any views on this or ongoing efforts that address this?
(Poking all participants in the WP3/WP6 meeting (who are on github): @JanOdijk @jorisvanzundert @karinavdo @julianeugarten)
The text was updated successfully, but these errors were encountered: