Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function allowlist for Executor and manual Graph management #1146

Merged
merged 2 commits into from
Jan 10, 2025

Conversation

diptanu
Copy link
Collaborator

@diptanu diptanu commented Jan 5, 2025

Making the executors only run functions of specified compute graphs.
The functions are specified as
indexify-cli executor --function <namespace>:<workflow>:<compute_func>:<version> --function ....

The allowlist is used with Executors in production setups. E.g. only run
a certain function in a Kubernetes POD with an Executor.
This is a default production setup because each function has different
resource usage and it’s easier to size containers per function.
Same with other container configs like secrets, roles, volumes.

indexify-cli executor --dev mode is still present. It makes server try
to run any function on the executor. This is convenient for development
of Indexify and for integration testing.

The new --function argument requires explicit control of function versions by the user.
Change Graph version type from u32 to string and allow user to set it in SDK.
This allows users to implement versioning of their graphs with any semantic they want.
Use a random uuid by default for the graph versions so if a user doesn't want to manage
versions manually we just always update the graph on each RemoteGraph.deploy().

Co-authored-by: Diptanu Gon Choudhury [email protected]
Co-authored-by: Eugene Batalov [email protected]

@diptanu diptanu force-pushed the executor-containers branch from 38dc7bf to 864f356 Compare January 5, 2025 05:01
@eabatalov eabatalov self-requested a review January 6, 2025 11:16
@@ -24,11 +24,14 @@ def __init__(
self,
executor_id: str,
code_path: Path,
compute_graph: str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be optional.

@@ -28,11 +33,19 @@ def __init__(
self.config_path = config_path
self._logger = structlog.get_logger(module=__name__)

hostname = socket.gethostname()
ip_address = socket.gethostbyname(hostname)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks error prone as we don't guarantee that this IP address is reachable by server if we obtain it this way. It's better to pass it in CLI args into Executor. Because this depends on deployment environment. I'm also not sure why we need to pass this IP address to the Server because so far Server wasn't calling Executor but Executor was calling Server which is guaranteed to have a hostname reachable by Executors.

@@ -525,11 +524,21 @@ pub struct InvocationId {
pub id: String,
}

#[derive(Debug, Serialize, Deserialize, ToSchema)]
pub struct FunctionContainer {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In current OpenSource terms this is FunctionExecutor.

ip_address = socket.gethostbyname(hostname)

functions = []
if namespace is not None and compute_graph is not None and function is not None and graph_version is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh I expected all these fields to be optional and passed directly to Server and it does filtering on its side with whatever provided values. E.g. if only namespace and compute_graph are set then server sees this in the ExecutorMetadata and then routes tasks for the namespace and the compute_graph to this Executor.

#[derive(Debug, Serialize, Deserialize, ToSchema)]
pub struct ExecutorMetadata {
pub id: String,
pub executor_version: String,
pub addr: String,
pub functions: Vec<FunctionContainer>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not instantiated FunctionContainers/FunctionExecutors. Because of this I'd call these things as Executor filtering labels. Server will need to eventually track FunctionContainers/FunctionExecutors that really exist. Not mixing the filtering labels and existing FunctionContainers/FunctionExecutors would be nice imo.

@diptanu diptanu force-pushed the executor-containers branch from a005315 to 6fdec9e Compare January 8, 2025 07:50
@@ -27,11 +33,31 @@ def __init__(
self.config_path = config_path
self._logger = structlog.get_logger(module=__name__)

hostname = socket.gethostname()
ip_address = socket.gethostbyname(hostname)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines also result in requiring to run Executor under sudo, fyi.

@eabatalov eabatalov force-pushed the executor-containers branch from 54eb620 to f08a42d Compare January 9, 2025 18:59
@eabatalov eabatalov changed the title making executors start containers explicitly Function allowlist for Executor Jan 9, 2025
@eabatalov eabatalov force-pushed the executor-containers branch from f08a42d to 474ea2a Compare January 9, 2025 19:19
Making the executors only run functions of specified compute graphs.
The functions are specified as

`indexify-cli executor --function <namespace>:<workflow>:<compute_func>:<version> --function ...`.

The allowlist is used with Executors in production setups. E.g. only run
a certain function in a Kubernetes POD with an Executor.
This is a default production setup because each function has different
resource usage and it’s easier to size containers per function.
Same with other container configs like secrets, roles, volumes.

`indexify-cli executor --dev` mode is still present. It makes server try
to run any function on the executor. This is convenient for development
of Indexify and for integration testing.

Co-authored-by: Diptanu Gon Choudhury <[email protected]>
Co-authored-by: Eugene Batalov <[email protected]>
@eabatalov eabatalov force-pushed the executor-containers branch 3 times, most recently from 03185ba to 477d8b9 Compare January 10, 2025 19:40
This allows users to implement versioning of their graphs with
any semantic they want. Use a random uuid by default for the
graph versions so if a user doesn't want to manage versions
manually we just always update the graph on each RemoteGraph.deploy().
@eabatalov eabatalov force-pushed the executor-containers branch from 477d8b9 to cf28112 Compare January 10, 2025 19:42
@eabatalov eabatalov changed the title Function allowlist for Executor Function allowlist for Executor and manual Graph management Jan 10, 2025
@diptanu diptanu merged commit 06d5af4 into main Jan 10, 2025
8 checks passed
@diptanu diptanu deleted the executor-containers branch January 10, 2025 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants