Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IA-4996] WIP: Deploy Jupyter on AKS #4678

Draft
wants to merge 33 commits into
base: develop
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
2284ac2
[IA-4996] Deploy Jupyter on AKS: TEST!
lucymcnatt Jun 28, 2024
2e393d9
more boilerplate
lucymcnatt Jul 1, 2024
8ee56bb
compiling
lucymcnatt Jul 2, 2024
fc28b25
config issue
lucymcnatt Jul 3, 2024
f210f7a
chart ver
lucymcnatt Jul 3, 2024
2a7eeaf
todos
lucymcnatt Jul 3, 2024
7c06172
use local ver of helm chart
lucymcnatt Jul 3, 2024
02b2a7b
log permissions
lucymcnatt Jul 8, 2024
ee35023
more logging
lucymcnatt Jul 8, 2024
604c987
allow disks for jup apps
lucymcnatt Jul 8, 2024
d4f8b82
app install
lucymcnatt Jul 8, 2024
b43b9f5
oops
lucymcnatt Jul 8, 2024
e0c2719
change chart loc
lucymcnatt Jul 8, 2024
37c9f24
missing /?
lucymcnatt Jul 9, 2024
190c9a8
log vals
lucymcnatt Jul 9, 2024
713ed44
get value
lucymcnatt Jul 9, 2024
387cb60
Merge branch 'develop' into IA-4996-deploy-jupyter-on-aks
lucymcnatt Jul 9, 2024
d4c0ac2
helm update
lucymcnatt Jul 9, 2024
b083b58
add resourceid
lucymcnatt Jul 9, 2024
eb41792
add package
lucymcnatt Jul 9, 2024
1a56f12
changing azure volume structure
lucymcnatt Jul 9, 2024
5380a66
status endpoint
lucymcnatt Jul 11, 2024
e7de53e
new helm
lucymcnatt Jul 12, 2024
8da02b0
new helm again
lucymcnatt Jul 12, 2024
a1c3da6
new helm again again
lucymcnatt Jul 12, 2024
7065488
launch w/o PD
lucymcnatt Jul 17, 2024
4f02a46
Merge branch 'develop' into IA-4996-deploy-jupyter-on-aks
lucymcnatt Jul 17, 2024
e2cb6b5
dynamic PD
lucymcnatt Jul 23, 2024
7a4d57a
Merge branch 'develop' into IA-4996-deploy-jupyter-on-aks
lucymcnatt Jul 23, 2024
56cfd49
formatting
lucymcnatt Jul 23, 2024
c363dc1
Merge branch 'IA-4996-deploy-jupyter-on-aks' of https://github.com/Da…
lucymcnatt Jul 23, 2024
aa4267a
adjust listener vals
lucymcnatt Jul 23, 2024
638883f
progress
lucymcnatt Jul 31, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ ENV CROMWELL_CHART_VERSION 0.2.523
ENV HAIL_BATCH_CHART_VERSION 0.2.0
ENV RSTUDIO_CHART_VERSION 0.12.0
ENV SAS_CHART_VERSION 0.17.0
ENV JUPYTER_CHART_VERSION 0.1.0

RUN mkdir /leonardo
COPY ./leonardo*.jar /leonardo
Expand All @@ -56,8 +57,11 @@ RUN helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx && \

# .Files helm helper can't access files outside a chart. Hence in order to populate cert file properly, we're
# pulling `terra-app-setup` locally and add cert files to the chart. As a result we need to pull all GKE
# charts locally as well so they can acess the local cert files during the helm install step, see https://helm.sh/docs/chart_template_guide/accessing_files/
# charts locally as well so they can access the local cert files during the helm install step, see https://helm.sh/docs/chart_template_guide/accessing_files/
# Helm does not seem to support the direct installation of a chart located in OCI so let's pull it to a local directory for now.
COPY ./jupyter-0.1.0.tgz /leonardo
RUN tar -xzf /leonardo/jupyter-0.1.0.tgz -C /leonardo

RUN cd /leonardo && \
helm repo update && \
helm pull terra-app-setup-charts/terra-app-setup --version $TERRA_APP_SETUP_VERSION --untar && \
Expand All @@ -68,6 +72,7 @@ RUN cd /leonardo && \
helm pull terra-helm/rstudio --version $RSTUDIO_CHART_VERSION --untar && \
helm pull terra-helm/sas --version $SAS_CHART_VERSION --untar && \
helm pull oci://terradevacrpublic.azurecr.io/hail/hail-batch-terra-azure --version $HAIL_BATCH_CHART_VERSION --untar && \
# helm pull terra-helm/jupyter --version $JUPYTER_CHART_VERSION --untar && \
cd /

# Install https://github.com/apangin/jattach to get access to JDK tools
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -661,7 +661,7 @@ object JsonCodec {
ComputeClass.stringToObject.get(s.toLowerCase).toRight(s"Invalid compute class ${s}")
)
implicit val autopilotDecoder: Decoder[Autopilot] =
Decoder.forProduct4("computeClass", "cpuInMillicores", "memoryInGb", "ephemeralStorageInGb")(Autopilot.apply)
Decoder.forProduct2("computeClass", "ephemeralStorageInGb")(Autopilot.apply)

implicit val locationDecoder: Decoder[Location] = Decoder.decodeString.map(Location)
implicit val kubeClusterIdDecoder: Decoder[KubernetesClusterLeoId] = Decoder.decodeLong.map(KubernetesClusterLeoId)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,10 @@ object FormattedBy extends Enum[FormattedBy] {
override def asString: String = "CROMWELL"
}

final case object Jupyter extends FormattedBy {
override def asString: String = "JUPYTER"
}

final case object Allowed extends FormattedBy {
override def asString: String = "ALLOWED"
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -364,6 +364,10 @@ object AppType {
override def toString: String = "HAIL_BATCH"
}

case object Jupyter extends AppType {
override def toString: String = "JUPYTER"
}

// See more context in https://docs.google.com/document/d/1RaQRMqAx7ymoygP6f7QVdBbZC-iD9oY_XLNMe_oz_cs/edit
case object Allowed extends AppType {
override def toString: String = "ALLOWED"
Expand All @@ -377,16 +381,18 @@ object AppType {
def stringToObject: Map[String, AppType] = values.map(v => v.toString -> v).toMap

/**
* Disk formatting for an App. Currently, only Galaxy, RStudio and Custom app types
* Disk formatting for an App. Currently, only Galaxy, RStudio, Jupyter and Custom app types
* support disk management. So we default all other app types to Cromwell,
* but the field is unused.
*/
def appTypeToFormattedByType(appType: AppType): FormattedBy =
appType match {
case Galaxy => FormattedBy.Galaxy
case Jupyter => FormattedBy.Jupyter
case Custom => FormattedBy.Custom
case Allowed => FormattedBy.Allowed
case Cromwell | Wds | HailBatch | WorkflowsApp | CromwellRunnerApp => FormattedBy.Cromwell

}
}

Expand Down Expand Up @@ -439,9 +445,9 @@ final case class App(id: AppId,
descriptorPath: Option[Uri],
extraArgs: List[String],
sourceWorkspaceId: Option[WorkspaceId],
numOfReplicas: Option[Int],
autodelete: Autodelete,
autopilot: Option[Autopilot],
autopilot: Boolean,
computeProfile: ComputeProfile,
bucketNameToMount: Option[GcsBucketName]
) {

Expand Down Expand Up @@ -598,7 +604,13 @@ object ComputeClass {
val stringToObject = values.map(v => v.toString.toLowerCase -> v).toMap
}
final case class Autodelete(autodeleteEnabled: Boolean, autodeleteThreshold: Option[AutodeleteThreshold])
final case class Autopilot(computeClass: ComputeClass, cpuInMillicores: Int, memoryInGb: Int, ephemeralStorageInGb: Int)

final case class ComputeProfile(numOfReplicas: Option[Int],
cpuInMi: Option[Int],
memoryInGb: Option[Int],
computeClass: Option[ComputeClass],
ephemeralStorageInGb: Option[Int]
)

final case class UpdateAppTableId(value: Long) extends AnyVal
final case class UpdateAppJobId(value: UUID) extends AnyVal
Expand Down
4 changes: 4 additions & 0 deletions http/src/main/resources/leo.conf
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,10 @@ azure {
enabled = ${?HAIL_BATCH_APP_ENABLED}
}

jupyter-app-config {
enabled = ${?JUPYTER_APP_ENABLED}
}

coa-app-config {
instrumentation-enabled = ${?COA_INSTRUMENTATION_ENABLED}
database-enabled = ${?COA_DATABASE_ENABLED}
Expand Down
17 changes: 17 additions & 0 deletions http/src/main/resources/reference.conf
Original file line number Diff line number Diff line change
Expand Up @@ -455,6 +455,23 @@ azure {
chart-versions-to-exclude-from-updates = []
}

jupyter-app-config {
chart-name = "/leonardo/jupyter" // TODO (LM) this should be terra-helm/jupyter
chart-version = "0.1.0"
release-name-suffix = "jupyter-rls"
namespace-name-suffix = "jupyter-ns"
ksa-name = "jupyter-ksa"
services = [
{
name = "jupyter"
kind = "ClusterIP"
}
]
enabled = true
# App developers - Please keep the list of non-backward compatible versions in the list below
chart-versions-to-exclude-from-updates = []
}

# App types which are allowed to launch with WORKSPACE_SHARED access scope.
allowed-shared-apps = [
"WDS",
Expand Down
1 change: 1 addition & 0 deletions http/src/main/resources/swagger/api-docs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2988,6 +2988,7 @@ components:
- ALLOWED
- WORKFLOWS_APP
- CROMWELL_RUNNER_APP
- JUPYTER
AllowedChartName:
type: string
enum:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ import org.broadinstitute.dsde.workbench.leonardo.{
LandingZoneResources,
ManagedIdentityName,
WorkspaceId,
WsmControlledDatabaseResource
WsmControlledDatabaseResource,
WsmControlledResourceId
}
import org.broadinstitute.dsp.Values
import org.http4s.Uri
Expand All @@ -34,6 +35,10 @@ trait AppInstall[F[_]] {

/** Checks status of the app. */
def checkStatus(baseUri: Uri, authHeader: Authorization)(implicit ev: Ask[F, AppContext]): F[Boolean]

// /** Checks status of the app. */
// def checkStatus(cloudContext: CloudContext, runtimeName: RuntimeName)(implicit ev: Ask[F, AppContext]): F[Boolean]

}

object AppInstall {
Expand All @@ -43,13 +48,15 @@ object AppInstall {
cromwellAppInstall: CromwellAppInstall[F],
workflowsAppInstall: WorkflowsAppInstall[F],
hailBatchAppInstall: HailBatchAppInstall[F],
cromwellRunnerAppInstall: CromwellRunnerAppInstall[F]
cromwellRunnerAppInstall: CromwellRunnerAppInstall[F],
jupyterAppInstall: JupyterAppInstall[F]
): AppType => AppInstall[F] = _ match {
case AppType.Wds => wdsAppInstall
case AppType.Cromwell => cromwellAppInstall
case AppType.WorkflowsApp => workflowsAppInstall
case AppType.HailBatch => hailBatchAppInstall
case AppType.CromwellRunnerApp => cromwellRunnerAppInstall
case AppType.Jupyter => jupyterAppInstall
case e => throw new IllegalArgumentException(s"Unexpected app type: ${e}")
}

Expand All @@ -75,6 +82,7 @@ object Database {

final case class BuildHelmOverrideValuesParams(app: App,
workspaceId: WorkspaceId,
workspaceName: String,
cloudContext: AzureCloudContext,
billingProfileId: BillingProfileId,
landingZoneResources: LandingZoneResources,
Expand All @@ -83,5 +91,6 @@ final case class BuildHelmOverrideValuesParams(app: App,
ksaName: ServiceAccountName,
managedIdentityName: ManagedIdentityName,
databaseNames: List[WsmControlledDatabaseResource],
config: AKSInterpreterConfig
config: AKSInterpreterConfig,
diskWsmResourceId: Option[WsmControlledResourceId]
)
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
package org.broadinstitute.dsde.workbench.leonardo.app
import cats.effect.Async
import cats.mtl.Ask
import cats.syntax.all._
import org.broadinstitute.dsde.workbench.leonardo.AppContext
import org.broadinstitute.dsde.workbench.leonardo.config.JupyterAppConfig
import org.broadinstitute.dsde.workbench.leonardo.dao.JupyterDAO
import org.broadinstitute.dsde.workbench.leonardo.util.AppCreationException
import org.broadinstitute.dsp.Values
import org.http4s.Uri
import org.http4s.headers.Authorization

/**
* Jupyter app.
*/
class JupyterAppInstall[F[_]](config: JupyterAppConfig, jupyterDao: JupyterDAO[F])(implicit F: Async[F])
extends AppInstall[F] {
override def databases: List[Database] = List.empty

override def buildHelmOverrideValues(
params: BuildHelmOverrideValuesParams
)(implicit ev: Ask[F, AppContext]): F[Values] =
for {
ctx <- ev.ask
// Storage container is required for Cromwell app
storageContainer <- F.fromOption(
params.storageContainer,
AppCreationException("Storage container required for Jupyter app", Some(ctx.traceId))
)

disk <- F.fromOption(
params.app.appResources.disk,
AppCreationException("Disk required for Jupyter app", Some(ctx.traceId))
)

// diskResourceId <- F.fromOption(
// params.diskWsmResourceId,
// AppCreationException("Disk required for Jupyter app", Some(ctx.traceId))
// )

values =
List(
// workspace configs
raw"workspace.id=${params.workspaceId.value.toString}",
raw"workspace.name=${params.workspaceName}",
raw"workspace.storageContainer.url=https://${params.landingZoneResources.storageAccountName.value}.blob.core.windows.net/${storageContainer.name.value}",
raw"workspace.storageContainer.resourceId=${storageContainer.resourceId.value.toString}",
raw"workspace.cloudProvider=Azure",

// persistent disk configs
raw"persistence.diskName=${disk.name.value}",
raw"persistence.diskSize=${disk.size.gb}",
raw"persistence.subscriptionId=${params.cloudContext.subscriptionId.value}",
raw"persistence.resourceGroupName=${params.cloudContext.managedResourceGroupName.value}",

// app resource requests
raw"resources.cpu=100", // ${params.app.appResources}",
raw"resources.memory=128", // ${disk.size.gb}",

// misc
raw"serviceAccount.name=${params.ksaName.value}",
raw"relay.connectionName=${params.app.appName.value}"
)
} yield Values(values.mkString(","))

override def checkStatus(baseUri: Uri, authHeader: Authorization)(implicit
ev: Ask[F, AppContext]
): F[Boolean] =
jupyterDao.getStatus(baseUri, authHeader).handleError(_ => false)
}
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ class SamAuthProvider[F[_]: OpenTelemetryMetrics](
.info(Map("traceId" -> traceId.asString), e)(s"$action is not allowed for resource $samResource")
.as(false)
}
_ <- logger.info(s"result of hasPermission($samResource, $action): $res")
} yield res

override def hasPermissionWithProjectFallback[R, A](
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ object KubernetesAppConfig {
case (CromwellRunnerApp, CloudProvider.Azure) => Some(ConfigReader.appConfig.azure.cromwellRunnerAppConfig)
case (Wds, CloudProvider.Azure) => Some(ConfigReader.appConfig.azure.wdsAppConfig)
case (HailBatch, CloudProvider.Azure) => Some(ConfigReader.appConfig.azure.hailBatchAppConfig)
case (Jupyter, CloudProvider.Azure) => Some(ConfigReader.appConfig.azure.jupyterAppConfig)
case _ => None
}
}
Expand Down Expand Up @@ -202,6 +203,22 @@ final case class HailBatchAppConfig(chartName: ChartName,
val appType: AppType = AppType.HailBatch
}

final case class JupyterAppConfig(chartName: ChartName,
chartVersion: ChartVersion,
releaseNameSuffix: ReleaseNameSuffix,
namespaceNameSuffix: NamespaceNameSuffix,
ksaName: KsaName,
services: List[ServiceConfig],
enabled: Boolean,
chartVersionsToExcludeFromUpdates: List[ChartVersion]
) extends KubernetesAppConfig {
override val kubernetesServices: List[KubernetesService] = services.map(s => KubernetesService(ServiceId(-1), s))
override val serviceAccountName = ServiceAccountName(ksaName.value)

val cloudProvider: CloudProvider = CloudProvider.Azure
val appType: AppType = AppType.Jupyter
}

final case class ContainerRegistryUsername(asString: String) extends AnyVal
final case class ContainerRegistryPassword(asString: String) extends AnyVal
final case class ContainerRegistryCredentials(username: ContainerRegistryUsername, password: ContainerRegistryPassword)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,28 +1,50 @@
package org.broadinstitute.dsde.workbench.leonardo.dao

import cats.effect.Async
import cats.mtl.Ask
import cats.syntax.all._
import io.circe.Decoder
import org.broadinstitute.dsde.workbench.leonardo.dao.ExecutionState.{Idle, OtherState}
import org.broadinstitute.dsde.workbench.leonardo.dao.HostStatus.HostReady
import org.broadinstitute.dsde.workbench.leonardo.dao.HttpJupyterDAO._
import org.broadinstitute.dsde.workbench.leonardo.dns.RuntimeDnsCache
import org.broadinstitute.dsde.workbench.leonardo.{CloudContext, RuntimeName}
import org.broadinstitute.dsde.workbench.leonardo.{AppContext, CloudContext, RuntimeName}
import org.broadinstitute.dsde.workbench.model.google.GoogleProject
import org.broadinstitute.dsde.workbench.openTelemetry.OpenTelemetryMetrics
import org.http4s.circe.CirceEntityDecoder._
import org.http4s.client.Client
import org.http4s.{Header, Headers, Method, Request}
import org.http4s.client.dsl.Http4sClientDsl
import org.http4s.headers.Authorization
import org.http4s.{Header, Headers, Method, Request, Uri}
import org.typelevel.ci.CIString
import org.typelevel.log4cats.Logger

//Jupyter server API doc https://github.com/jupyter/jupyter/wiki/Jupyter-Notebook-Server-API
class HttpJupyterDAO[F[_]](val runtimeDnsCache: RuntimeDnsCache[F], client: Client[F], samDAO: SamDAO[F])(implicit
F: Async[F],
logger: Logger[F]
) extends JupyterDAO[F] {
logger: Logger[F],
metrics: OpenTelemetryMetrics[F]
) extends JupyterDAO[F]
with Http4sClientDsl[F] {
private val SETDATEACCESSEDINSPECTOR_HEADER_IGNORE: Header.Raw =
Header.Raw(CIString("X-SetDateAccessedInspector-Action"), "ignore")

def getStatus(baseUri: Uri, authHeader: Authorization)(implicit
ev: Ask[F, AppContext]
): F[Boolean] = for {
_ <- metrics.incrementCounter("jupyter/status")
res <- client.status(
Request[F](
method = Method.GET,
uri = baseUri / "api" / "status", // TODO (LM) this may need to change
headers = Headers(authHeader)
)
)
_ <- logger.info(s"(LM) Jupyter endpoint: ${baseUri / "api" / "status"}")
_ <- logger.info(s"(LM) Jupyter status result: $res")

} yield res.isSuccess

def isProxyAvailable(cloudContext: CloudContext, runtimeName: RuntimeName): F[Boolean] =
for {
hostStatus <- Proxy.getRuntimeTargetHost[F](runtimeDnsCache, cloudContext, runtimeName)
Expand All @@ -37,7 +59,9 @@ class HttpJupyterDAO[F[_]](val runtimeDnsCache: RuntimeDnsCache[F], client: Clie
client
.successful(
Request[F](
method = Method.GET,
method =
Method.GET, // private def azureUri: Uri = Uri.unsafeFromString(s"https://${hostname.address()}/${path}")
// https://hostIp/runtimeName/api/status
uri = x.toNotebooksUri / "api" / "status",
headers = headers
)
Expand Down Expand Up @@ -110,13 +134,6 @@ object HttpJupyterDAO {
implicit val sessionDecoder: Decoder[Session] = Decoder.forProduct1("kernel")(Session)
}

trait JupyterDAO[F[_]] {
def isAllKernelsIdle(cloudContext: CloudContext, runtimeName: RuntimeName): F[Boolean]
def isProxyAvailable(cloudContext: CloudContext, runtimeName: RuntimeName): F[Boolean]
def createTerminal(googleProject: GoogleProject, runtimeName: RuntimeName): F[Unit]
def terminalExists(googleProject: GoogleProject, runtimeName: RuntimeName, terminalName: TerminalName): F[Boolean]
}

sealed abstract class ExecutionState
object ExecutionState {
case object Idle extends ExecutionState {
Expand Down
Loading
Loading