Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance Kubernetes Readiness with gRPC Health Checks & Update Kind Cluster Setup #1479

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 91 additions & 4 deletions jac-splice-orc/ReadMe.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,8 @@ JAC Cloud Orchestrator (`jac-splice-orc`) is a system designed to dynamically im
- [1. Clone the Repository](#1-clone-the-repository)
- [2. Install Dependencies](#2-install-dependencies)
- [3. Configure the System](#3-configure-the-system)
- [4. Initialize the System](#4-initialize-the-system)
- [4. Recreate the Kind Cluster with Port Mappings](#4-recreate-the-kind-cluster-with-port-mappings)
- [5. Initialize the System](#5-initialize-the-system)
- [Docker Usage](#docker-usage)
- [Usage](#usage)
- [Client Application](#client-application)
Expand Down Expand Up @@ -131,7 +132,7 @@ Before you begin, ensure that you have the following installed and configured:

- **Python** (version 3.9 or later): [Install Python](https://www.python.org/downloads/)
- **Docker** (version 20.10 or later): [Install Docker](https://docs.docker.com/get-docker/)
- **Kubernetes** (version 1.21 or later): [Install Kubernetes](https://kubernetes.io/docs/setup/)
- **Kind** (Kubernetes IN Docker): [Install Kind](https://kind.sigs.k8s.io/docs/user/quick-start/#installation)
- **kubectl** command-line tool: [Install kubectl](https://kubernetes.io/docs/tasks/tools/)
- **Jac**: [Install Jaclang](https://github.com/Jaseci-Labs/jasecii)
- **Kubernetes Cluster**: Ensure you have access to a Kubernetes cluster (local or remote).
Expand Down Expand Up @@ -217,11 +218,97 @@ Edit `jac_splice_orc/config/config.json` to match your environment. Here's an ex
- Replace `jaseci/jac-splice-orc:latest` with your own image if you have customized it.
- Adjust resource requests and limits according to your environment.

### 4. Initialize the System
### 4. Recreate the Kind Cluster with Port Mappings

To ensure that your Kubernetes cluster can expose services correctly, especially when using **Kind** (Kubernetes IN Docker), you need to recreate the Kind cluster with specific port mappings. This allows services like the Pod Manager to be accessible from your host machine without relying solely on port-forwarding.

**Why Recreate the Kind Cluster?**

- **Port Accessibility**: By mapping container ports to host ports, you can access Kubernetes services directly via `localhost:<port>` on your machine.
- **Simplified Access**: Eliminates the need for manual port-forwarding or additional networking configurations.

**Steps to Recreate the Kind Cluster with Port Mappings:**

1. **Delete the Existing Kind Cluster**

If you already have a Kind cluster running, delete it to allow recreation with new configurations.

```bash
kind delete cluster --name little-x-kind
```

**Note**: Replace `jac-splice-orc with your cluster name if different.

2. **Create a Kind Configuration File**

Create a YAML configuration file named `kind-config.yaml` with the desired port mappings. This file instructs Kind to map specific container ports to host ports.

```yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
extraPortMappings:
- containerPort: 30080
hostPort: 30080
protocol: TCP
```

**Explanation:**

- **containerPort**: The port inside the Kubernetes cluster (i.e., the port your service listens on).
- **hostPort**: The port on your local machine that maps to the `containerPort`.
- **protocol**: The network protocol (`TCP` or `UDP`).

3. **Create the New Kind Cluster with Port Mappings**

Use the `kind-config.yaml` to create a new Kind cluster with the specified port mappings.

```bash
kind create cluster --name little-x-kind --config kind-config.yaml
```

**Output Example:**

```
Creating cluster "little-x-kind" ...
✓ Ensuring node image (kindest/node:v1.21.1) 🖼
✓ Preparing nodes 📦
✓ Writing configuration 📜
✓ Starting control-plane node kind-control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
Set kubectl context to "kind-little-x-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-little-x-kind

Thanks for using Kind! 🎉
```


### Summary of Steps:

1. **Delete Existing Cluster**: `kind delete cluster --name jac-splice-orc
2. **Create Config File**: Define `kind-config.yaml` with desired port mappings.
3. **Create New Cluster**: `kind create cluster --name little-x-kind --config kind-config.yaml`
4. **Verify Mappings**: Ensure ports are correctly mapped using `kubectl` and `docker` commands.

**Important Considerations:**

- **Port Conflicts**: Ensure that the `hostPort` values you choose are not already in use on your host machine.
- **Cluster Name**: Adjust the cluster name (`jac-splice-orc) as per your preference or organizational standards.
- **Security**: Exposing ports directly to `localhost` can have security implications. Ensure that only necessary ports are exposed and consider implementing authentication or network policies if needed.

---

### 5. Initialize the System

Once the cluster is set up with the appropriate port mappings, proceed to initialize the Pod Manager and Kubernetes resources.

Use the provided CLI command to initialize the Pod Manager and Kubernetes resources:

```bash
```jac
jac orc_initialize jac-splice-orc
```

Expand Down
1 change: 1 addition & 0 deletions jac-splice-orc/docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ FROM python:3.12-slim
RUN pip install --no-cache-dir \
grpcio \
grpcio-tools \
grpcio-health-checking\
fastapi \
uvicorn \
kubernetes \
Expand Down
6 changes: 3 additions & 3 deletions jac-splice-orc/jac_splice_orc/config/config.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,14 @@
"deployment_name": "pod-manager-deployment",
"service_account_name": "jac-orc-sa",
"container_name": "pod-manager",
"image_name": "ashishmahendra/jac-splice-orc:0.0.6",
"image_name": "ashishmahendra/jac-splice-orc:0.0.8",
"container_port": 8000,
"service_name": "pod-manager-service",
"service_type": "LoadBalancer",
"env_vars": {
"SERVICE_TYPE": "pod_manager",
"NAMESPACE": "jac-splice-test",
"IMAGE_NAME": "ashishmahendra/jac-splice-orc:0.0.6"
"IMAGE_NAME": "ashishmahendra/jac-splice-orc:0.0.8"
},
"resources": {
"requests": {
Expand Down Expand Up @@ -71,6 +71,6 @@
}
},
"environment": {
"POD_MANAGER_URL": "http://a88a549ed32f14b14b1333a81ebd7a2a-1627559794.us-west-2.elb.amazonaws.com:8000"
"POD_MANAGER_URL": "http://localhost:8000"
}
}
53 changes: 39 additions & 14 deletions jac-splice-orc/jac_splice_orc/managers/pod_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,13 @@ def create_pod(self, module_name: str, module_config: dict) -> Any:
"mountPath": f"/app/requirements/{module_name}",
}
],
}
"readinessProbe": {"grpc": {"port": 50051}},
"initialDelaySeconds": 10,
"periodSeconds": 5,
"timeoutSeconds": 5,
"failureThreshold": 3,
"successThreshold": 1,
},
],
"volumes": [
{
Expand All @@ -124,20 +130,23 @@ def create_pod(self, module_name: str, module_config: dict) -> Any:

try:
existing_configmap = self.v1.read_namespaced_config_map(
name=f"{module_name}-requirements",
namespace=self.namespace
name=f"{module_name}-requirements", namespace=self.namespace
)
print(f"ConfigMap '{module_name}-requirements' already exists.")
except client.exceptions.ApiException as e:
if e.status == 404:
if e.status == 404:
# Create the ConfigMap
print(f"ConfigMap '{module_name}-requirements' not found. Creating it...")
print(
f"ConfigMap '{module_name}-requirements' not found. Creating it..."
)
_ = self.v1.create_namespaced_config_map(
self.namespace,
body={
"metadata": {"name": f"{module_name}-requirements"},
"data": {
"requirements.txt": open(requirements_file_path, "r").read()
"requirements.txt": open(
requirements_file_path, "r"
).read()
},
},
)
Expand Down Expand Up @@ -199,20 +208,36 @@ def delete_pod(self, module_name: str) -> Any:

def wait_for_pod_ready(self, pod_name: str) -> None:
"""Wait until the pod is ready."""
max_retries = 30
max_retries = 120
retries = 0
while retries < max_retries:
pod_info = self.v1.read_namespaced_pod(
name=pod_name, namespace=self.namespace
)
if pod_info.status.phase == "Running":
logging.info(
f"Pod {pod_name} is running with IP {pod_info.status.pod_ip}"
try:
pod_info = self.v1.read_namespaced_pod(
name=pod_name, namespace=self.namespace
)
except client.exceptions.ApiException as e:
logging.error(f"Error fetching pod info for {pod_name}: {e}")
raise Exception(f"Error fetching pod info for {pod_name}: {e}")

conditions = pod_info.status.conditions or []
ready = False
logging.info(f"Pod {pod_name} is in phase: {pod_info.status.phase}")
for condition in conditions:
logging.info(f"Condition: {condition.type} - {condition.status}")
if condition.type == "Ready" and condition.status == "True":
ready = True
break
if ready:
logging.info(f"Pod {pod_name} is ready and ready to serve requests.")
return
elif pod_info.status.phase in ["Failed", "Unknown"]:
raise Exception(f"Pod {pod_name} is in {pod_info.status.phase} phase.")
retries += 1
logging.info(
f"Waiting for pod {pod_name} to be ready... (Attempt {retries}/{max_retries})"
)
time.sleep(2)
raise Exception(f"Timeout: Pod {pod_name} failed to reach 'Running' state.")
raise Exception(f"Timeout: Pod {pod_name} failed to become ready.")

def get_pod_service_ip(self, module_name: str) -> str:
"""Look up the service IP for the pod corresponding to the module."""
Expand Down
10 changes: 10 additions & 0 deletions jac-splice-orc/jac_splice_orc/server/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
import logging
import traceback

from grpc_health.v1 import health, health_pb2_grpc, health_pb2

logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")


Expand Down Expand Up @@ -182,6 +184,14 @@ def serve(module_name):
module_service_pb2_grpc.add_ModuleServiceServicer_to_server(
ModuleService(module_name), server
)

health_servicer = health.HealthServicer()
health_pb2_grpc.add_HealthServicer_to_server(health_servicer, server)

health_servicer.set(
service="ModuleService",
status=health_pb2.HealthCheckResponse.SERVING,
)
server.add_insecure_port("[::]:50051")
server.start()
logging.info("gRPC server started and listening on port 50051")
Expand Down
Loading