Switch to Restricted Mode

Introduction

Morello supports two banks of registers for certain registers such as Default Data Capability DDC, Capability Stack Pointer CSP, and Thread ID Register TPIDR_EL0. These banks are referred to as Restricted and Executive. Access to these banks is controlled by the EXECUTIVE permission bit in PCC. When this bit is set, we run in the Executive mode, otherwise we run in the Restricted mode.

Name	Executive bank	Restricted bank
Default Data Capability	DDC_EL0	RDDC_EL0
Capability Stack Pointer	CSP_EL0	RCSP_EL0
Thread ID Register	CTPIDR_EL0	RCTPIDR_EL0

In the Restricted mode we can access one of the banks of these registers while in the Executive mode both banks can be accessed.

Accessing via...	Executive	Restricted
DDC	DDC_EL0	RDDC_EL0
RDDC_EL0	RDDC_EL0	(fault)
CSP	CSP_EL0	RCSP_EL0
RCSP_EL0	RCSP_EL0	(fault)
CTPIDR_EL0	TPIDR_EL0	RTPIDR_EL0
RCTPIDR_EL0	RTPIDR_EL0	(fault)

This can be used to implement compartmentalisation. The management code would run in the Executive mode and would be able to set up stack pointer and thread ID register for each compartment while the isolated code running in such compartment would only "perceive" the environment set up for it.

There are also restrictions imposed by the architecture on interworking between these two modes. To switch to the Restricted mode we need two things:

A restricted function pointer (that is a sentry without the EXECUTIVE permission).
Branch via special instruction B(L)RR (Branch (with Link) to capability Register with possible switch to Restricted).

Branching via ordinary BLR instruction would clear the tag in the target PCC which would result in a capability fault on instruction fetch right after branching. The same rule applies to returning from a function running in the Executive mode via a restricted CLR. Instead a special instruction RETR (Return with possible switch to Restricted) should be used. Both BLRR and RETR will fault if executed in the Restricted mode.

Returning from the Restricted mode (or calling an executive function) only requires a sentry with the EXECUTIVE permission. Ordinary BLR and RET instructions can be used.

Switching to the Restricted mode without proper setup may not work or result in a security problem, that is why Morello requires use of the new instructions for this. On the other hand, being able to request an operation run in the Executive mode from code that runs in the Restricted mode is useful because this is how we can ask our runtime to switch to another compartment.

Design Overview

In this example we use our own tiny runtime library to make things very simple. The execution starts in the Executive mode when the _start function is called (see src/start.S). We must run usual initialisation to setup all capabilities before we can proceed. Then we prepare switch to Restricted mode and call the main function. This is where we enter the application code:

<---------------- E ----------------> <------ R ------> <---- E ---->
_start --> _init_compartments --> _start --> main --> _start --> exit

We execute main in something called a "root compartment". It is like any other compartment, but it is set up automatically when the app starts.

Application can remain in the root compartment for the duration of the process or it can instantiate more compartments and execute some code in them.

When switching to another compartment the following things happen:

Callee-saved registers are saved to the caller's stack.
A compartment descriptor is loaded from compartments private data.
Executive switch function is called (switching to Executive mode).
Caller's TPIDR_EL0 and CSP are stored on the executive stack (not accessible from other compartments).
Callee's TPIDR_EL0 and CSP capabilities are loaded and set up from the compartment descriptor.
Target function's arguments are loaded into registers.
All unused registers are sanitised.
Target function is called in the Restricted mode (unless the sentry has the EXECUTIVE permission, see below).
The executive link capability is RB-sealed and is saved to CLR.

After returning from the target function we do these operations in reverse. All stack allocations are placed in private mappings and only capabilities that are explicitly provided by the caller compartment to the callee compartment can be used to exchange data between compartments.

Code Examples

Restricting Global Functions

Let's consider the following example in the restricted.c file. The first part of it shows one of the consequences of the rules described above. When using an indirect call to a global function, we may accidentally switch to Executive mode. This depends on how the capability for this global function was set up by the runtime initialisation (see init function). Such a function will usually inherit executable capabilities from the AT_CHERI_EXEC_RX_CAP root capability provided by the kernel, and this capability will have both EXECUTIVE and SYSTEM permissions by default.

This is why we will need to change the initialisation procedure and remove the EXECUTIVE and SYSTEM permissions from all function pointers that are supposed to be used by the restricted code. However, we may also want to keep executive copies of some of the global functions to use them from executive code (e.g. for setting things up or managing compartments).

This example is rather simple, and executive copies of the standard library functions are not needed as we can do everything via direct calls.

Creating a Compartment

We create a new instance of a compartment by calling (see include/rcmpt.h):

switch_t *cmpt_fun = create_compartment(target_fun, 2 /* pages */);

The returned capability is actually a sentry. It can be used in the same way as the wrapped target function target_fun. This sentry points to code generated on the fly from the _thunk function (see _thunk in src/start.S). This code will use near-relative load to access the compartment descriptor with the following information:

typedef struct {
    void *exec;     // executive pointer to switch trampoline (sentry)
    void *target;   // target function (sentry)
    void *tp;       // thread pointer (not used currently)
    void *sp;       // stack pointer
    void *cid;      // CID capability for the compartment
} thunk_data_t;

The thunk code branches to the capability in the exec field (it points to the _switch function and contains EXECUTIVE permission). The switch code runs in Executive mode and therefore can do the necessary setup before branching to the restricted target function.

Note that at this point if the target function pointer is executive the switch to compartment will not happen and the target function will remain running in Executive mode. In theory, the code that creates compartment instances via the create_compartment function will not have access to any executive function pointers. If it does, it can do the switch to the Executive mode just by doing an indirect call via it, in which case the compartment isolation will be breached. This is another reason why we should initialise any global function pointers with care. The check if the target function is restricted is not currently implemented.

Invocation of a compartment is simple: instead of

res = target_fun(arg0, arg1, arg2);

just use

res = cmpt_fun(arg0, arg1, arg2);

Note that variadic targets are not supported in the current implementation.

Nested Compartment Calls

In the same way as we call a compartment from the root compartment, we can also do nested calls and invoke other compartments. Every switch to the next compartment will use Executive mode and a separate stack frame on the executive stack to retain caller's data while running the callee.

Private Data of the Compartment Manager

Note that compartment manager holds global object that contains data that can be used to escape for a compartment. In this implementation it remains unprotected. To solve this problem, we can use the BRS-sealed capability pair (see the privdata example in the compartments folder). This suggests that in practice the switch-to-restricted compartments might co-exists with the branch-to-sealed-pair compartments. However, we could also use private memory mapping to store this global object and keep the capability pointing to it on the executive stack that would be inaccessible from any compartments (including the root compartment).