Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration with Graphene #525

Open
AI-Memory opened this issue Jul 16, 2021 · 17 comments
Open

Integration with Graphene #525

AI-Memory opened this issue Jul 16, 2021 · 17 comments
Labels

Comments

@AI-Memory
Copy link
Contributor

Motivation & problem statement

As #502 questioned that the integration with a LibOS would be interesting feature for a FaaS platform because allowing to run regular app. directly in enclave along with LibOS handling most syscalls may further improve the usability and simplify the usage model. Theoretically, the performance might be improved slightly as well. from the other side, the app. dev. is agnostic to the LibOS/Kernel. so the app. might contains IO/Mem intensive operations that will have negative impact on performance even stability.

Proposed solution

Regarding Graphene, which is a LibOS that can support unmodified app. to run inside enclave, its multiple process feature may not be very useful for FaaS. In this stage, I think we may need to define an interface for LibOS kind integration first to manage the lifecycle&IO of trusted LibOS enclave instance.

@AI-Memory
Copy link
Contributor Author

Graphene

@mssun
Copy link
Member

mssun commented Jul 19, 2021

Hi @bigdata-memory, thanks for the proposal.

Teaclave is a FaaS-like platform consisting of multiple services. One can register and invoke functions to the platform through the frontend service. Then, Teaclave will help to do authentication, authorization, task preparation, etc and finally dispatch a task to the execution service. Currently, we have three executors for different languages and scenarios. For example, the builtin executor only execute functions written in Rust natively, while, the mesapy executor can run functions written in Python dynamically.

Therefore, to integrate with Graphene, or any other LibOS kind of system in Teaclave, we need to introduce another kind of executor. Luckly, the execution service in Teaclave are designed as a stateless and isolated service which takes a task, executes it and write back the result. Inputs/Outputs of a functions are defined as function arguments, input/output files stored in protected file system. In our design, we prohibit any untrusted I/O interfaces (e.g., write date to the untrusted file system, networks I/O) in the executor to protecting sensitive data operated by the function.

Above are some background about our design rationals. Here, I list some concepts involved in integrating with Graphene.

  • RPC: All Teaclave services communicate with each other via trusted channel (i.e., attested TLS). You can find all protocol definitions in Protobuf in this directory: https://github.com/apache/incubator-teaclave/tree/master/services/proto/src/proto. An execution service instance will periodically fetch tasks from the scheduler service and execute it.
  • Attestation: We do mutual attestation when establishing trusted RPC channel through attested TLS.
  • Executor Runtime: Providing interfaces to an executor to read/write input/output files stored in the protected file system.
  • TeaclaveExecutor Trait: The TeaclaveExecutor trait defines interfaces of a task.

Recently, we added a WebAssembly executor. You can see recents changes to see how to add an executor in Teaclave. Here is a simple document https://teaclave.apache.org/docs/adding-executors/.

The WebAssembly executor is just a reference. For integrating with Graphene (or other LibOS executors), the design may be different. I'm not sure whether to link the LibOS within the execution service or put it as a separate enclave along with the execution service. The later may need additional attestation mechanism.

At last, can you propose a simple design so that we can work on it together? Please let me know if you have any questions regarding to the current Teaclave design. Thanks!

@AI-Memory
Copy link
Contributor Author

Hi @mssun, Thanks you very much for the detailed explanation on leveraging LibOS kind system in Teaclave.

It is very helpful for us to work on this effectively. the GSGX so far only has relative stable CLI interface and its internal APIs are not designed for 3rd party integration since it's API still under evolving. roughly, we may consider to running GSGX instance as sidecars along with its execution driver service and do mutual attestation with its driver beforehand. It is just my preliminary idea for now, we may define assumption, constraints and requirements for this soon later. Thanks!

@AI-Memory
Copy link
Contributor Author

AI-Memory commented Jul 21, 2021

In attestation page, the way to binding is We make the certificate cryptographically bound to a specific enclave instance by adding the public key of the certificate in the attestation report., how the service certification can get revoked and re-issued at runtime ? Thanks.

@mssun
Copy link
Member

mssun commented Jul 21, 2021

how the service certification can get revoked and re-issued at runtime?

We don't have the revoking mechanism currently.

Right now, there is a freshness timer for the attestation/certificate. Once it's timeout, the service will do another attestation and update the report, re-issue the certificate as well.

@ya0guang
Copy link
Member

I'm also interested in incorporating Graphene as an Executor of Teaclave. However, I think we must take serious consideration about the interfaces of Graphene (mainly system calls).

As @bigdata-memory mentioned, many syscalls are handled by libOS, but there're also many syscalls which are forwarded, either in unaltered way or with some changes, to the OS kernel. These forwarded syscalls includes I/O operations on file system, Networks, as well as timing system (e.g. nanosleep) and synchronization (e.g. futex). Besides, Graphene can also handle interrupts/exceptions.

To avoid using such untrusted interfaces and information leakage on these interfaces, Teaclave implements no untrusted interface. And Graphene aims to support trusted binary, whereas Teaclave assumes potentially malicious function. Such difference in threat model results in the differences on supported/exposed interfaces to binaries/functions running inside Graphene/Teaclave. If we want to leverage Graphene's power, we may need to think about several questions:

  • Tradeoff between binary compatibility and security: how to deal with untrusted interfaces?
  • How to run untrusted binary in Graphene securely? Personaly I believe auditing can be a solution
  • How can Teaclave and Graphene communicate securely? AFAIK, Graphene currently lacks a secure channel implementation.

@AI-Memory
Copy link
Contributor Author

Hi @ya0guang, Glad to know you also got interested in this. Yes. as you mentioned, the Graphene forwards many syscalls to untrusted host, more specifically, 41 SGX OCALLs to untrusted PAL layer of Graphene and do some kinds of security check. it is similar to have some unsafe calls to external libraries in Rust. we need to be very careful to design a mechanism to audit it and filter out unexpected behaviors according to some terms that can be created for untrusted Pal as a new party.

@AI-Memory
Copy link
Contributor Author

Hi @mssun, as we talked, the Graphene will download several dependent 3rd party's codebases during building time. e.g. glibc, mbedtls, uthash, toml, gcc. all downloads are verified by hashcodes, however, the document mentioned will only consume packages from this designated repository and will not download any code from external package registry in page THIRD-PARTY DEPENDENCY VENDORING. does it comply with what is mentioned in the document if linking Graphene as one of submodules in Teaclave project ? Thanks.

@AI-Memory
Copy link
Contributor Author

In addition, Graphene project licensed as GPL v3.0 along with an addendum, but it is not permissive compatible with Apache 2, the page GPL-compatibility says However, GPLv3 software cannot be included in Apache projects, as talked, looks Graphene cannot be fully integrated with Teaclave at building time, so before Graphene executor is available to use, it needs to be deployed on platform separately, please advice, thanks.

@AI-Memory
Copy link
Contributor Author

Hi @ya0guang I'm studying the PR #504 as a guide to adding Graphene as an executor. In the example script wasm_simple_add.py, the payload_file: wasm_simple_add_payload/simple_add.wasm is loaded and registered in front service. It looks the payload is not part of measurement of enclave, do I understand it correctly ? Thanks.

@ya0guang
Copy link
Member

ya0guang commented Jul 27, 2021

Hi @bigdata-memory , I think you're right about this example! Since the enclave measurement is determined by its initial code and data before instantiation, the payload which is loaded into Teaclave after instantiation cannot affect the measurement. At least for MesaPy and WASM executor, the payload is uploaded by the user and then loaded after enclave instantiation. However I guess the story may be different for the Builtin executor, because if you want to add a function, you may need to change the executor itself directly first before an enclave is generated.

@mssun
Copy link
Member

mssun commented Jul 27, 2021

In addition, Graphene project licensed as GPL v3.0 along with an addendum, but it is not permissive compatible with Apache 2, the page GPL-compatibility says However, GPLv3 software cannot be included in Apache projects, as talked, looks Graphene cannot be fully integrated with Teaclave at building time, so before Graphene executor is available to use, it needs to be deployed on platform separately, please advice, thanks.

@bigdata-memory, thanks for pointing out this issue. I just checked that Graphene is in LGPL v3.0 license.

As stated in ASF 3rd Party License Policy:

You may NOT include the following licenses within Apache products:

  • Places restrictions on larger works: GNU LGPL 2, 2.1, 3

Therefore, I suggest NOT to include any Graphene code in Teaclave or as a third-party dependency.

Given this situation, if we plan to integrate with Graphene. I suggest to do the followings in Teaclave:

  • Provide interfaces for any libOS executors
  • Write documents on integrating with Graphene
  • Implement a mock libOS implementation for testing only

@AI-Memory
Copy link
Contributor Author

@mssun, thank you for the suggestion. regarding Provide interfaces for any libOS executors, the 3rd-party libOS would be running in a separate process/enclave space, should we keep it running or once per each request ? thanks.

@AI-Memory
Copy link
Contributor Author

We drafted the design of GSGX enabling on Teaclave, Please review. thanks.
Teaclave_libos_gsgx

teaclave_libos_gsgx.pdf

@hanboa
Copy link

hanboa commented Aug 13, 2021

@bigdata-memory Regarding the design, how is that going to facilitate the attestation flow in the Teaclave? Will it provide additional features or benefits on top of it?

@AI-Memory
Copy link
Contributor Author

@hanboa Thank you for this question, Yes, you are right, the attestation flow is not explicitly shown here, In this design draft, we consider to retrieve the quote from the component Teaclave premain, honestly, it is not a graceful way to fetch it through this customized component, I proposed one solution to GSGX community, but their maintainer disagreed that it is a legitimate use to support the feature of writing the quote out to /var/run/ directory. However, the component GrapheneDriver is designed handle it anyway. In GrapheneController, it should verify the retrieved quote right before issuing the start(...).
Regarding the additional features or benefits you mentioned, I don't see that from this preliminary design, please advice, thanks.

@AI-Memory
Copy link
Contributor Author

AI-Memory commented Sep 15, 2021

@mssun As we discussed, It is quite challenging to maintain the security model for this integration because the GSGX is not designed to be integrated at API level, it has to be running in external processes, the secrets and encrypted workload/inputs/arguments are required to be exchanged with a new pre-load module which, in turns, needs to do the following tasks:

  1. mutual attesting with Teaclave executor

  2. establish a trusted channel with Teaclave executor

  3. receive and parse the workload binaries and dependencies, input files, arguments, those are all complex objects.

  4. deploy received workload binaries and dependencies, input files, arguments to GSGX enclave at runtime in correct layout and aligned with pre-defined GSGX manifest

  5. the output of GSGX needs to be serialized and send back to Teaclave executor.

  6. The pre-load module cannot manage the lifecycle of workload that is running in GSGX due to the limitation of GSGX, it has to be handled by Teaclave executor in external Linux signal mechanism, but it is not reliable and GSGX doesn't honor some signal type and still unstable to handle basic signals in GSGX

In addition, The GSGX is limited to run unique workload, it means it doesn't natively support measurement decoupling feature (refused by GSGX maintainer ITL), it implies a huge performance penalties and significant SGX resource consumption for FaaS usage scenario.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants