diff --git a/jac-splice-orc/ReadMe.md b/jac-splice-orc/ReadMe.md index fea42303e9..1a4283f747 100644 --- a/jac-splice-orc/ReadMe.md +++ b/jac-splice-orc/ReadMe.md @@ -18,7 +18,8 @@ JAC Cloud Orchestrator (`jac-splice-orc`) is a system designed to dynamically im - [1. Clone the Repository](#1-clone-the-repository) - [2. Install Dependencies](#2-install-dependencies) - [3. Configure the System](#3-configure-the-system) - - [4. Initialize the System](#4-initialize-the-system) + - [4. Recreate the Kind Cluster with Port Mappings](#4-recreate-the-kind-cluster-with-port-mappings) + - [5. Initialize the System](#5-initialize-the-system) - [Docker Usage](#docker-usage) - [Usage](#usage) - [Client Application](#client-application) @@ -129,25 +130,15 @@ jac-splice-orc/ Before you begin, ensure that you have the following installed and configured: -- **Python** (version 3.9 or later): [Install Python](https://www.python.org/downloads/) -- **Docker** (version 20.10 or later): [Install Docker](https://docs.docker.com/get-docker/) -- **Kubernetes** (version 1.21 or later): [Install Kubernetes](https://kubernetes.io/docs/setup/) -- **kubectl** command-line tool: [Install kubectl](https://kubernetes.io/docs/tasks/tools/) -- **Jac**: [Install Jaclang](https://github.com/Jaseci-Labs/jasecii) +- **Python** (version 3.11 or later) +- **Docker** (version 20.10 or later) +- **Kubernetes** (version 1.21 or later) +- **kubectl** command-line tool - **Kubernetes Cluster**: Ensure you have access to a Kubernetes cluster (local or remote). Ensure that your Kubernetes cluster is up and running, and that you can connect to it using `kubectl`. -### 1. Clone the Repository - -Clone the `jac-splice-orc` repository to your local machine: - -```bash -git clone https://github.com/Jaseci-Labs/jac-splice-orc.git -cd jac-splice-orc -``` - -### 2. Install Dependencies +### 1. Install Dependencies Create a virtual environment and install the required Python packages: @@ -156,9 +147,15 @@ python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt ``` - **Note**: The `requirements.txt` file includes all necessary dependencies, such as `kubernetes`, `grpcio`, `PyYAML`, and others. +### 2. Install via pip + +You can install `jac-splice-orc` directly from PyPI: +```bash +pip install jac-splice-orc +``` + ### 3. Configure the System The application uses a `config.json` file located in the `jac_splice_orc/config/` directory for all configurations. @@ -217,11 +214,97 @@ Edit `jac_splice_orc/config/config.json` to match your environment. Here's an ex - Replace `jaseci/jac-splice-orc:latest` with your own image if you have customized it. - Adjust resource requests and limits according to your environment. -### 4. Initialize the System +### 4. Recreate the Kind Cluster with Port Mappings + +To ensure that your Kubernetes cluster can expose services correctly, especially when using **Kind** (Kubernetes IN Docker), you need to recreate the Kind cluster with specific port mappings. This allows services like the Pod Manager to be accessible from your host machine without relying solely on port-forwarding. + +**Why Recreate the Kind Cluster?** + +- **Port Accessibility**: By mapping container ports to host ports, you can access Kubernetes services directly via `localhost:` on your machine. +- **Simplified Access**: Eliminates the need for manual port-forwarding or additional networking configurations. + +**Steps to Recreate the Kind Cluster with Port Mappings:** + +1. **Delete the Existing Kind Cluster** + + If you already have a Kind cluster running, delete it to allow recreation with new configurations. + + ```bash + kind delete cluster --name little-x-kind + ``` + + **Note**: Replace `jac-splice-orc with your cluster name if different. + +2. **Create a Kind Configuration File** + + Create a YAML configuration file named `kind-config.yaml` with the desired port mappings. This file instructs Kind to map specific container ports to host ports. + + ```yaml + kind: Cluster + apiVersion: kind.x-k8s.io/v1alpha4 + nodes: + - role: control-plane + extraPortMappings: + - containerPort: 30080 + hostPort: 30080 + protocol: TCP + ``` + + **Explanation:** + + - **containerPort**: The port inside the Kubernetes cluster (i.e., the port your service listens on). + - **hostPort**: The port on your local machine that maps to the `containerPort`. + - **protocol**: The network protocol (`TCP` or `UDP`). + +3. **Create the New Kind Cluster with Port Mappings** + + Use the `kind-config.yaml` to create a new Kind cluster with the specified port mappings. + + ```bash + kind create cluster --name little-x-kind --config kind-config.yaml + ``` + + **Output Example:** + + ``` + Creating cluster "little-x-kind" ... + ✓ Ensuring node image (kindest/node:v1.21.1) đŸ–ŧ + ✓ Preparing nodes đŸ“Ļ + ✓ Writing configuration 📜 + ✓ Starting control-plane node kind-control-plane 🕹ī¸ + ✓ Installing CNI 🔌 + ✓ Installing StorageClass 💾 + Set kubectl context to "kind-little-x-kind" + You can now use your cluster with: + + kubectl cluster-info --context kind-little-x-kind + + Thanks for using Kind! 🎉 + ``` + + +### Summary of Steps: + +1. **Delete Existing Cluster**: `kind delete cluster --name jac-splice-orc +2. **Create Config File**: Define `kind-config.yaml` with desired port mappings. +3. **Create New Cluster**: `kind create cluster --name little-x-kind --config kind-config.yaml` +4. **Verify Mappings**: Ensure ports are correctly mapped using `kubectl` and `docker` commands. + +**Important Considerations:** + +- **Port Conflicts**: Ensure that the `hostPort` values you choose are not already in use on your host machine. +- **Cluster Name**: Adjust the cluster name (`jac-splice-orc) as per your preference or organizational standards. +- **Security**: Exposing ports directly to `localhost` can have security implications. Ensure that only necessary ports are exposed and consider implementing authentication or network policies if needed. + +--- + +### 5. Initialize the System + +Once the cluster is set up with the appropriate port mappings, proceed to initialize the Pod Manager and Kubernetes resources. Use the provided CLI command to initialize the Pod Manager and Kubernetes resources: -```bash +```jac jac orc_initialize jac-splice-orc ``` diff --git a/jac-splice-orc/docker/Dockerfile b/jac-splice-orc/docker/Dockerfile index 3aeba516c8..6fb2d80d2b 100644 --- a/jac-splice-orc/docker/Dockerfile +++ b/jac-splice-orc/docker/Dockerfile @@ -5,6 +5,7 @@ FROM python:3.12-slim RUN pip install --no-cache-dir \ grpcio \ grpcio-tools \ + grpcio-health-checking\ fastapi \ uvicorn \ kubernetes \ diff --git a/jac-splice-orc/jac_splice_orc/config/config.json b/jac-splice-orc/jac_splice_orc/config/config.json index c0abf86aff..8295bcb551 100644 --- a/jac-splice-orc/jac_splice_orc/config/config.json +++ b/jac-splice-orc/jac_splice_orc/config/config.json @@ -6,14 +6,14 @@ "deployment_name": "pod-manager-deployment", "service_account_name": "jac-orc-sa", "container_name": "pod-manager", - "image_name": "ashishmahendra/jac-splice-orc:0.0.6", + "image_name": "ashishmahendra/jac-splice-orc:0.0.8", "container_port": 8000, "service_name": "pod-manager-service", "service_type": "LoadBalancer", "env_vars": { "SERVICE_TYPE": "pod_manager", "NAMESPACE": "jac-splice-test", - "IMAGE_NAME": "ashishmahendra/jac-splice-orc:0.0.6" + "IMAGE_NAME": "ashishmahendra/jac-splice-orc:0.0.8" }, "resources": { "requests": { @@ -71,6 +71,6 @@ } }, "environment": { - "POD_MANAGER_URL": "http://a88a549ed32f14b14b1333a81ebd7a2a-1627559794.us-west-2.elb.amazonaws.com:8000" + "POD_MANAGER_URL": "http://localhost:8000" } } \ No newline at end of file diff --git a/jac-splice-orc/jac_splice_orc/managers/pod_manager.py b/jac-splice-orc/jac_splice_orc/managers/pod_manager.py index 0b5dbf35ff..384fa6cbbc 100644 --- a/jac-splice-orc/jac_splice_orc/managers/pod_manager.py +++ b/jac-splice-orc/jac_splice_orc/managers/pod_manager.py @@ -110,7 +110,13 @@ def create_pod(self, module_name: str, module_config: dict) -> Any: "mountPath": f"/app/requirements/{module_name}", } ], - } + "readinessProbe": {"grpc": {"port": 50051}}, + "initialDelaySeconds": 10, + "periodSeconds": 5, + "timeoutSeconds": 5, + "failureThreshold": 3, + "successThreshold": 1, + }, ], "volumes": [ { @@ -124,20 +130,23 @@ def create_pod(self, module_name: str, module_config: dict) -> Any: try: existing_configmap = self.v1.read_namespaced_config_map( - name=f"{module_name}-requirements", - namespace=self.namespace + name=f"{module_name}-requirements", namespace=self.namespace ) print(f"ConfigMap '{module_name}-requirements' already exists.") except client.exceptions.ApiException as e: - if e.status == 404: + if e.status == 404: # Create the ConfigMap - print(f"ConfigMap '{module_name}-requirements' not found. Creating it...") + print( + f"ConfigMap '{module_name}-requirements' not found. Creating it..." + ) _ = self.v1.create_namespaced_config_map( self.namespace, body={ "metadata": {"name": f"{module_name}-requirements"}, "data": { - "requirements.txt": open(requirements_file_path, "r").read() + "requirements.txt": open( + requirements_file_path, "r" + ).read() }, }, ) @@ -199,20 +208,36 @@ def delete_pod(self, module_name: str) -> Any: def wait_for_pod_ready(self, pod_name: str) -> None: """Wait until the pod is ready.""" - max_retries = 30 + max_retries = 120 retries = 0 while retries < max_retries: - pod_info = self.v1.read_namespaced_pod( - name=pod_name, namespace=self.namespace - ) - if pod_info.status.phase == "Running": - logging.info( - f"Pod {pod_name} is running with IP {pod_info.status.pod_ip}" + try: + pod_info = self.v1.read_namespaced_pod( + name=pod_name, namespace=self.namespace ) + except client.exceptions.ApiException as e: + logging.error(f"Error fetching pod info for {pod_name}: {e}") + raise Exception(f"Error fetching pod info for {pod_name}: {e}") + + conditions = pod_info.status.conditions or [] + ready = False + logging.info(f"Pod {pod_name} is in phase: {pod_info.status.phase}") + for condition in conditions: + logging.info(f"Condition: {condition.type} - {condition.status}") + if condition.type == "Ready" and condition.status == "True": + ready = True + break + if ready: + logging.info(f"Pod {pod_name} is ready and ready to serve requests.") return + elif pod_info.status.phase in ["Failed", "Unknown"]: + raise Exception(f"Pod {pod_name} is in {pod_info.status.phase} phase.") retries += 1 + logging.info( + f"Waiting for pod {pod_name} to be ready... (Attempt {retries}/{max_retries})" + ) time.sleep(2) - raise Exception(f"Timeout: Pod {pod_name} failed to reach 'Running' state.") + raise Exception(f"Timeout: Pod {pod_name} failed to become ready.") def get_pod_service_ip(self, module_name: str) -> str: """Look up the service IP for the pod corresponding to the module.""" diff --git a/jac-splice-orc/jac_splice_orc/server/server.py b/jac-splice-orc/jac_splice_orc/server/server.py index c2c0355c6e..bd4c76cc3e 100644 --- a/jac-splice-orc/jac_splice_orc/server/server.py +++ b/jac-splice-orc/jac_splice_orc/server/server.py @@ -10,6 +10,8 @@ import logging import traceback +from grpc_health.v1 import health, health_pb2_grpc, health_pb2 + logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s") @@ -182,6 +184,14 @@ def serve(module_name): module_service_pb2_grpc.add_ModuleServiceServicer_to_server( ModuleService(module_name), server ) + + health_servicer = health.HealthServicer() + health_pb2_grpc.add_HealthServicer_to_server(health_servicer, server) + + health_servicer.set( + service="ModuleService", + status=health_pb2.HealthCheckResponse.SERVING, + ) server.add_insecure_port("[::]:50051") server.start() logging.info("gRPC server started and listening on port 50051") diff --git a/jac-splice-orc/jac_splice_orc/test/test_pod_manager.py b/jac-splice-orc/jac_splice_orc/test/test_pod_manager.py index 3d8bb179c4..a444465ebc 100644 --- a/jac-splice-orc/jac_splice_orc/test/test_pod_manager.py +++ b/jac-splice-orc/jac_splice_orc/test/test_pod_manager.py @@ -1,6 +1,18 @@ import pytest from fastapi.testclient import TestClient from unittest import mock +from unittest.mock import MagicMock +from kubernetes.client import ( + V1Pod, + V1ObjectMeta, + V1PodSpec, + V1PodStatus, + V1Condition, + V1Service, + V1Status, +) +from kubernetes.client.rest import ApiException +from datetime import datetime, timezone # Mock gRPC imports to avoid ImportError in test environment with mock.patch.dict( @@ -16,13 +28,23 @@ client = TestClient(app) +def create_condition_ready(status): + return V1Condition( + type="Ready", + status=status, + last_transition_time=datetime.now(timezone.utc), + reason="TestingCondition", + message="Condition simulation", + ) + + @pytest.fixture def mock_kubernetes_and_grpc(): with mock.patch( "kubernetes.config.load_incluster_config" ) as mock_load_incluster_config, mock.patch( "kubernetes.client.CoreV1Api" - ) as mock_v1_api, mock.patch.object( + ) as mock_core_v1_api, mock.patch.object( PodManager, "create_pod" ) as mock_create_pod, mock.patch.object( PodManager, "delete_pod" @@ -32,24 +54,65 @@ def mock_kubernetes_and_grpc(): PodManager, "get_pod_service_ip" ) as mock_get_pod_service_ip: - # Mock load_incluster_config to avoid loading in-cluster config during tests mock_load_incluster_config.return_value = None - # Mock Kubernetes CoreV1Api methods - mock_v1_api.return_value = mock.Mock() + mock_v1_api_instance = mock_core_v1_api.return_value + + call_count = [0] + + def read_pod_side_effect(name, namespace, *args, **kwargs): + if name == "numpy-pod": + if call_count[0] == 0: + call_count[0] += 1 + # First check: pod phase = Pending, not ready yet + return V1Pod( + metadata=V1ObjectMeta(name="numpy-pod"), + spec=V1PodSpec(containers=[]), + status=V1PodStatus(phase="Pending", conditions=[]), + ) + else: + # Second check: pod is now Running and Ready + return V1Pod( + metadata=V1ObjectMeta(name="numpy-pod"), + spec=V1PodSpec(containers=[]), + status=V1PodStatus( + phase="Running", conditions=[create_condition_ready("True")] + ), + ) + raise ApiException(status=404, reason="Not Found") + + mock_v1_api_instance.read_namespaced_pod.side_effect = read_pod_side_effect + + mock_v1_api_instance.read_namespaced_config_map.side_effect = ApiException( + status=404, reason="Not Found" + ) + + mock_v1_api_instance.create_namespaced_config_map.return_value = MagicMock() + + mock_v1_api_instance.create_namespaced_pod.return_value = V1Pod( + metadata=V1ObjectMeta(name="numpy-pod"), + spec=V1PodSpec(containers=[]), + status=V1PodStatus(phase="Pending", conditions=[]), + ) + + mock_v1_api_instance.create_namespaced_service.return_value = V1Service( + metadata=V1ObjectMeta(name="numpy-service") + ) + + mock_v1_api_instance.delete_namespaced_pod.return_value = V1Status( + message="Pod numpy-pod deleted successfully." + ) + mock_v1_api_instance.delete_namespaced_service.return_value = V1Status( + message="Service numpy-service deleted successfully." + ) - # Mock responses for Kubernetes actions mock_create_pod.return_value = { "message": "Pod numpy-pod and service numpy-service created." } mock_delete_pod.return_value = { "message": "Pod numpy-pod and service numpy-service deleted." } - - # Mock get_pod_service_ip to avoid real Kubernetes API calls - mock_get_pod_service_ip.return_value = "127.0.0.1" # Return a mock IP address - - # Mock gRPC method call to return expected value + mock_get_pod_service_ip.return_value = "127.0.0.1" mock_forward_to_pod.return_value = "[1, 2, 3, 4]" yield @@ -65,7 +128,6 @@ def test_create_pod(mock_kubernetes_and_grpc): } } response = client.post("/create_pod/numpy", json={"module_config": module_config}) - assert response.status_code == 200 assert response.json() == { "message": "Pod numpy-pod and service numpy-service created." @@ -77,7 +139,7 @@ def test_run_module(mock_kubernetes_and_grpc): "/run_module?module_name=numpy&method_name=array", json={"args": [1, 2, 3, 4]} ) assert response.status_code == 200 - assert response.json() == "[1, 2, 3, 4]" # Expected output from gRPC mock + assert response.json() == "[1, 2, 3, 4]" def test_delete_pod(mock_kubernetes_and_grpc): diff --git a/jac-splice-orc/setup.py b/jac-splice-orc/setup.py index 2d39b0813d..03de3555e7 100644 --- a/jac-splice-orc/setup.py +++ b/jac-splice-orc/setup.py @@ -21,6 +21,7 @@ "requests", "python-dotenv", "numpy", + "jaclang", ], entry_points={ "jac": [