Extremely Reliable Operating System

Posted by : Unknown Monday, July 1, 2013

ABSTRACT :-

the EROS, Extremely Reliable Operating System, addresses the issues of reliability and security by combining three ideas from earlier systems: capabilities and a persistent single-level store. Capabilities unify object naming with access control. Persistence extends this naming and access control uniformly across the memory hierarchy; main memory is viewed simply as a cache of the single-level store. The combination simplifies application design, allows programs to observe the "principle of least privilege," and enables active objects to be constructed securely. EROS is built on a simple object model that is easy to implement and understand.

INTRODUCTION :-

EROS: The Extremely Reliable Operating System

EROS is a new operating system originally implemented at the University of Pennsylvania. The project has now migrated to Johns Hopkins University. EROS merges some very old ideas in operating systems with some newer ideas about performance and resource management. The result is a small, secure, real-time operating system that provides orthogonal persistence.

HARDWARE REQUIREMENTS :-

What hardware does EROS run on?

The EROS runs on the hard ware like X 86-class PC's,486 and higher. It doesn't support 386 because 386 supervisor mode does not honour write protect bit in the page table entries.

Support for networking is limited. The only network interface cards are 3COM 3C509/3C509

And 3COM 3C905.The drivers IDE and EIDE are fully supported. SCSI is handled in the future and no graphics support at present

FEATURES :-

1. Pure Capability Architecture :

EROS is a pure capability system. A capability uniquely identifies an object and a set of access rights. Processes holding a capability can perform the operations permitted by those access rights on the named object. Holding a capability is a necessary and sufficient condition for accessing the associated object with the authority granted by that capability.

2. Orthogonal Global Persistence :

The basic idea of orthogonal global persistence is quite simple: on a periodic basis, or when requested by an authorized application, a consistent snapshot of the entire system state is taken. This consistent snapshot includes the state of all running programs, the contents of main memory, and any necessary supporting data structures.

Global persistence means that the state of all processes is captured at one instant; in the event that the system is forced to recover from the snapshot, all applications are consistent with respect to each other.

Orthogonal persistence means that applications do not need to take any special action to be a consistent part of the snapshot.

EROS persistence is a kernel service. While this makes the EROS kernel larger than non-persistent kernels such as QNX, non-persistent kernels do not provide a comparable means of transparent recovery.

3. Security :

Security experts worry greatly about confinement. Confinement is the process of making sure that information does not leak to unauthorized users.

While confinement remains a crucial problem in secure and mission-critical applications, the confinement problem is not currently very important. The on-going fall in hardware prices, coupled with the shift to client-server technology, has made this problem fairly easy to solve. To build a secure server, buy a dedicated machine, turn off all network services except the one you intend to run, and let that service performs its own authorization.

Given this approach, the remaining security hazards come from three sources:

1) Hardware bugs, some of which cannot be corrected by software.

2) Implementation bugs, such as failure to check input buffer lengths.

3) Misused authority, where a program operating with special privilege is made to use that privilege at an inappropriate time

5. Stateless Supervisor :

Forthcoming.

6. Deadlock-Free Supervisor :

7. A Small Kernel :

Forthcoming.

x86 Bootstrap

The note describes how the new x86 bootstrap code works. The x86 bootstrap has been rewritten several times, and the code is difficult to read because portions of it are in 16-bit assembler. It is written as a possible source for both users and as information for other x86 OS authors.

1. Sequence of Operation : Here is a summary discussion of the sequence of events in booting an x86 machine.

1.1 Load Master Boot Record

The BIOS first loads sector zero of the disk at (linear) 0x7c00, and branches to it by jumping to the segmented 16-bit address 0:0x7c00. On entry to the MBR, the following values are in registers:

v %CS:%EIP 0x0:0x7c00

v %DL BIOS identifier of the booted drive

No assumptions should be made about other registers. In particular it is not aware of any safe assumptions concerning other segment registers.

If this is a floppy boot, then this is not an MBR, and execution proceeds as described under boot record, below.

If this is in fact an MBR sector, the code embedded in the MBR record moves itself out of the way (where to is not documented, and no safe assumptions can be made) and then searches the partition table for the ``active'' partition. If an active partition is found, the MBR loads sector zero of that partition at linear address 0x7c00 and branches to it. If no partition is active, processing halts at this point.

1.2 Load Partition Boot Record :

The partition (or floppy) boot record is entered with the following register environment:

Ø % 0x0:0x7c00

Ø CS:%

Ø EIP

Ø %DL BIOS identifier of the booted drive

Ø % Only in hard disk boot. Points to partition table

Ø ES:% entry IFF this was a hard disk boot.

this is the only RELIABLE way to know

which image was booted, as there may be multiple instances of EROS installed, and some MBR multiboot loaders do not set the active flag on the in-memory partition table.

1.3 Load Extended Bootstrap :

From this point on, what happens is entirely the business of the operating system-specific boot code. The initial boot sector has 512 bytes less some tables in which to load a more general bootstrap.

A complication in reading the EROS bootstrap code is that it is written using a 32-bit assembler. Some operations can be coded directly, but don't mean what they say they mean (e.g. ``mov %eax,%ebx'' looks like a 32 bit operation but is actually a 16 bit operation). Others must be prefixed by extended opcode bytes because the linker only gives us 32-bit relocations. Still others must be hand-coded as code bytes, bypassing the assembler mechanisms almost entirely.

Suffice it to say that this is entirely unmaintainable, and an utter pain in the ass to read. Bryan Ford did us all a great service by introducing the code16 directive. Unfortunately, not all Linux assemblers support this directive, there are bugs in the current implementation (different in different assembler versions), and it does not resolve the linker's lack of support for 16 bit relocations. For these reason we do not use it.

For this reason, the EROS bootstrap logic loads the rest of the primary bootstrap as fast as possible, gets into 32-bit code, and then does it's best to stay there.

Some OS bootstraps load the secondary bootstrap cleverly, making use of the information in the partition table. The EROS bootstrap instead follows the *BSD convention: the sectors to be loaded are precomputed by a bootstrap installer utility. The EROS bootstrap code simply reads and applies this precomputed table. The table originally compiled in to the raw bootstrap binary is suitable for use in a 1.44M floppy.

The extended bootstrap is loaded to a precompiled linear address (presently 0x80000, controlled in boot.h).

2. EROS Extended Bootstrap :

The goal of the extended bootstrap is to enter the operating system. It proceeds in three steps:

q Enter 32-bit protected mode

q See if a ramdisk image is to be loaded. If so, load it. This may equire decompression.

q Load the kernel from the disk (or ramdisk).

2.1 Entering 32-bit Protected Mode :

On entry into the extended bootstrap code, the first thing done is to enter 32-bit protected mode. This allows us to use the assembler straightforwardly, and more importantly to use the C compiler to write much of the remaining bootstrap.

For the most part, the only tricky aspect to entering protected mode is the number of constraints on the problem. The global descriptor table must contain the following:

¨ Code and data descriptors for the bootstrap code as 32 bit code

¨ Code and data descriptors for a 32-bit mode flat linear mapping of the entire address space (to be used by the newly-loaded kernel).

¨ A code descriptor for the bootstrap code as 16 bit code.

The last is required because we will need to switch back into 16-bit mode in order to call the BIOS to load further disk sectors. For a similar reason, the stack of the secondary bootstrap must fall within the low 1Mbyte of real memory (this is why we did not load the extended bootstrap above the 1Mbyte boundary.

Finally, the extended bootstrap places its heap at 1Mbyte (growing upward). This allows the decompression library to run, but means that malloc'd memory cannot be passed as a buffer to the BIOS, because the BIOS cannot address such memory.

2.2 Balance of Extended Bootstrap :

The balance of the extended bootstrap is written in a mix of C/C++. Experience suggests that the bootstrap code changes, and that the C code serves as it's own best documentation.

In fact, all of this code should be written in C. The only reason that C++ was used was for compatibility with some structure declarations used by the kernel proper. Note, in particular, that the program is linked without the standard runtime environment, and that global constructors are not executed. This isn't a problem, because there are no globules requiring construction in the current bootstrap.

KERNEL

Kernel Virtual Memory

This note describes how EROS lays out kernel virtual memory at present, and explores a new design in which most user state is not mapped by the kernel.

1. Physical Memory Model :

The EROS kernel views memory as a list of contiguous ranges. Because legacy DMA engines impose addressability constraints, each physical memory range has an associated memory ``class,'' (an integer) indicating how precious (i.e. how constrained) that memory is. When memory allocation is performed, the allocating code specifies the required class. Allocation will be performed from the least precious class matching the requirements.

Note: The importance of this convention is declining. After a while, I concluded that it was better to just integrate bounce buffers into the DMA management software. In the current kernel, I believe that the only code which allocates memory under a constraint is the DMA engine itself.

2. Current (before 5 Jan 1998) Kernel Memory Model :

The EROS kernel uses main memory for several purposes. In approximate order of allocation, these are:

Kernel Mapping Table(s) :

The kernel mapping tables map all of the following space. The process ``owns'' the low 3 gigabytes of the virtual address space, while the kernel owns the upper 1 gigabyte. The kernel map in the current design includes a redundant map of all of physical memory.

Interrupt Management Structures :

Any structures and tables necessary to handle interrupts.

Driver Buffers :

Drivers are free to allocate kernel pages during startup.

User Thread Table :

The master thread table, containing all of the running or sleeping threads in the system.

Context Cache :

The cache for active processes, used in context switching and IPC.

Inverted Mapping Table :

Used to invalidate mappings when the capability slots they depend on are changed. This data structure is referred to in the EROS kernel as the Depend Table

Core Table :

The per-object header structures associated with physical pages. There is one core table entry for every page frame in main memory. In the current implementation, this includes the page frames used for page directories and page tables.

Node Space :

The memory frames for in-core nodes, which space must be generally accessable to the kernel.

Page Space :

All user pages reside here. In the current implementation, page space is mapped into the kernel's virtual memory map.

3. Proposed Kernel Memory Model :

There are two major problems with the model described above:

► It limits physical memory to slightly less than 1GB.

► It does not reserve any part of the address space for use as ``small spaces.

The first point may seem insignificant, but at the University of Pennsylvania we had a number of machines configured with 256 Mbytes of memory in 1997. The price/density curve suggests that the 1GB boundary cannot be far away. While the feature isn't widely used, current Pentium Pro processors (and possibly Pentium II -- I'm not sure, but I am told that they are essentially repackaged Pentium Pro dies) can support more than 4GB of physical memory already, which we would someday like to leverage.

3.1. Kernel References to User Pages :

There are a small number of places where the supervisor makes reference to the content of a data page:

1. From Software : There are several cases in which the kernel makes a data reference from software:

Ọ Neither Mapped: In the data page clone operation, neither source nor destination is necessarily mapped. Note, however, that this operation returns no data to the user.

Ọ PIO: Programmed I/O moves data to or from a page that is not necessarily related to the current process in any way. This page must be temporarily mapped into the kernel before I/O can occur. Bounce buffer copies are also a form of PIO.

In the EROS kernel design, PIO occurs with interrupts disabled, and never occurs from a nested kernel interrupt. This means that PIO does not interfere with other copies that may be in progress.

2. Physical DMA: If the DMA hardware uses physical addresses, the fact that the page frame is no mapped into the kernel presents no special difficulties; the kernel merely needs to know the physical address of the page frame.

3. Virtual DMA: If the DMA operation uses virtual addresses, then a mapping must somehow be constructed. In such architectures there is generally an efficient way to shoot down stale mappings, or if necessary a technique similar to the one used for programmed I/O can be employed.

3.3. Revised Memory Layout

Given that the page references are cleanly definable, the revised kernel memory model excludes user pages from the kernel memory map. The new address space layout is as follows:

+-----------------+------------------+-------------------+

| Large Space | Small Spaces | Kernel Virtual |

| (unique) | (shared) | (shared) |

+-----------------+------------------+-------------------+

Small spaces and kernel space are shared across all virtual maps, enabling rapid transition to/from kernel and to/from small spaces. While nodes are directly access able within kernel space, page data is not unless it is mapped into the user address space. Note that capability load and store instructions always access pages that are read/write to the kernel but not to the user, so this falls under the user-accessible pages scenario.

In this model, page table page frames are pre-reserved, and cannot be taken from the general memory pool. More precisely: they could be taken from the general page pool, but the need to translate addresses means that it basically isn't worth it to do so because page table access wants to be fast.

3.3. Kernel Virtual Memory

The factors that were not considered at the design time.

♥ Page table update requires translation from physical addresses to virtual addresses for page tables.

♥ Depend table invalidation. Translation from physical addresses to core table entries for address translation.

♥ Growing the kernel address map, and adding mappings to it.

3.3.1 Page Table Update

When an address fault is satisfied, a page table entry must be constructed. This page table entry must be written into the appropriate physical page, so a kernel virtual address for the page frame must exist or must already be established.

There are several ways to approach this problem. Listed in order of performance:

Contrive for all user page tables to come out of a contiguously allocated range of physical pages, and map these physical pages at a contiguous range of virtual addresses. If this is done, the interconversion can be accomplished by adding or subtracting a suitable constant.

This approach imposes a fixed limit on the number of page tables available for user mappings. This number can be conditional zed on the amount of available memory, but it is not obvious what a good heuristic would be.

3.3.2 Depend Table Invalidation

When a node slot is modified, the corresponding depend table entries must be invalidated. To do this, the PTE entries must be updated and the TLB must be flushed.

3.3.3 Kernel Address Map

A number of data structures must be mapped into the kernel virtual address space. A few of these want to be mapped from contiguous physical memory, but this doesn't really pose special problems to the kernel mapping mechanism per se. The question is: how is the kernel map itself to be updated.

One can imagine three designs for the kernel virtual memory map:

Dedicated. A memory map covering the entire physical memory is constructed at startup time. When the kernel must update it's own mapping structures, it switches temporarily to this map. The dedicated map maps physical memory in what would otherwise be per-process virtual space, and kernel memory according to the normal kernel mapping(s).

Reflexive. The kernel map appears in kernel virtual memory, and is updated by making suitable read and write operations via the mapped virtual addresses.

Non-reflexive. The kernel map does not appear in kernel virtual memory.

One solution to the temporary mapping construction problem is to hand-construct a sufficient mapping table to support building the temporary mapping, and then do the rest by building temporary mappings.

A Programmer's Introduction to EROS

This document is a work in progress. It's current contents are mostly correct. This document provides an introduction to the EROS system from a programmer's point of view. If you are building programs that will run under an operating environment, such as Dionysix.

This note does not try to provide an exhaustive description of the EROS environment. Instead, it tries to provide enough information to get you started, leaving details to be addressed by other documents.

1. Process

Applications in EROS are typically constructed out of several independent processes, which we call process.

A process consists of an address space, set of registers, which change as the process executes,A set of 16 capabilities, more commonly referred to as keys, keeper, which is another process that is invoked when a process takes an exception. The keeper is established when a process is first created. Most processes are unable to change their keeper.

Each process implements a specific, well-defined collection of related services. Each process carries exactly the authority that it requires to perform its function and no more.

While it is common to build several new process for each application, the EROS system comes with a toolkit of useful process. Some of these provide high-level facilities such as sorting and data management. Others implement various supporting policies for such things as page fault handling. In addition to constructing new components, an application designer will typically make use of several such ``off the shelf'' process in constructing an application.

Processes are more than just libraries. In addition to code and data, each process has the ability to remember information from one session to the next, and runs in it's own protected environment. Services implemented using process can run in parallel with the application they serve. We refer to such protected, independent services as active persistent agents. They are one of the keys to EROS's reliability.

Placing a service in a separate process has several advantages:

The service runs in its own address space.

The service can be tested independently.

The service can be used by non-trusted clients.

The primary disadvantage is that invoking a process is somewhat more expensive than performing a library call.

1.1. Persistence

All objects in EROS, including process, are persistent. If the system is shut down, or for some reason crashes, the state of all running programs is preserved by the checkpoint mechanism. The EROS system periodically saves a consistent copy of the entire system state to disk. When the system restarts, it resumes from the most recently saved system image. Typically, no more than 5 minutes work is lost in such a failure.

A consequence of persistence is that process exists forever. Until a process is explicitly dismantled, the services it implements remain available to clients with suitable keys.

Persistent processes enable a whole new approach to application design. Instead of building shared libraries, common services in EROS are placed in separate, persistent process. Because process are persistent, there is no need to recreate them each time an application is started, which would be necessary in a non-persistent system. In addition, fault isolation and security are greatly improved by persistence. For example, EROS programs can hold authorities that are different from the user's authorities .

1.2. Views of Process

From the programmer's perspective, there are (at least) two useful views of process:

A process is an application running program that invokes services

A process is the implementer of one or more related services.

1.3. Process States

In addition to providing services, process provides a mechanism for synchronization. A process can be occupied by at most one thread. If a client invokes a process that is currently running, the client blocks until the process becomes available. More precisely, process can be in one of three states:

State Description

Available An available process is currently idle, an can be invoked by any holder of an appropriate start key. This is the state of most of the process in the system. An available process is not occupied by any thread.

Running Once it has been invoked, a process moves from the available state to the running state. A running process is busy servicing an invocation, and is occupied by a thread. If a process is invoked while it is running, the invoker will block until the process becomes available. A process becomes available when it performs a return operation.

Waiting A process that has invoked a key and is waiting for a response moves from the running state to the waiting state. It remains in this state until it receives a reply to its invocation. If a process is invoked while it is waiting, the invoker will block until the process becomes available.

2. Keys and Objects

A key grants the authority to use a particular service or manipulate an object. Keys can be loosely broken into a small number of categories:

Object Keys The primary objects in EROS are pages and nodes. All other objects in the system are constructed out of these primary object types. Each of these objects has an associated key type.

Primary objects are implemented directly by the EROS kernel. When a program invokes a primary key, the action requested is performed directly by the EROS supervisor.

Process and address spaces are constructed out of nodes and pages, and the keys that manipulate these objects are also implemented by the kernel.

Start Keys Process can publish services to their clients. The creator of a new process can fabricate a start key to that process. The start key includes:

The identity of the process that implements the service

A type field, known as the data byte, indicating which service is to be performed.

A program that holds a start key to a particular process can use the service by invoking the start key. The message sent during the invocation is sent directly to the implementing process.

Device Keys Access to hardware devices is provided through device keys. Generally speaking, a device key is a thin layer over the raw hardware. The kernel implementation of the device key does only enough checking to make sure that a request will not damage the rest of the kernel.

A program that holds a device key has direct control over the underlying hardware. Device keys are therefore closely held, which means that they are given out only to the programs that are supposed to manage the devices.

Most programs use only object and start keys. A few of the more commonly used objects and their keys are described below. A discussion of address spaces and start keys can be found in subsequent sections.

3. Address Spaces

A address space provides a mechanism for assembling pages into a larger, addressable pool. The current EROS implementation supports address space trees of up to 264 bytes, and 64 bit implementations will probably increase this limit (the current address space structure can handle 2128 byte address spaces, but the number key size allows only 96 bits of offset to be requested).

4. Invocation Types

EROS implements three types of invocations: call, return, and send.

4.1 Call Invocations

The call invocation can send only three keys, because the kernel generates a resume key and places it in the fourth key slot of the message. A unique return key is generated for each call, and the kernel guarantees that every call will receive at most one return. Return keys are, in effect, self-consuming. If any copy of a given return key is invoked all others are effectively invalidated.

If the recipient is running or waiting for another process, the caller will block until the recipient becomes available. Along with the caller's message, the recipient receives a copy of the data byte from the service key, indicating which service was invoked.

In addition to specifying the location and size of the keys and data to be sent, the caller specifies where any keys and data returned from the service should be placed. When the service returns, it's returned information will be placed in the buffer and key slots specified by the caller.

When a call invocation is performed, the calling process ceases to run, and is placed in the waiting state. It will eventually resume execution when the resume key is invoked. The recipient, meanwhile, transitions from the available state to the running state, and begins processing the service request. We say that the thread which occupied the caller has migrated to the service process.

4.2. Return Invocations

The return invocation is the mirror to the call invocation. The returner specifies the message and result code to be returned, and also specifies the location and slots into which the next message should be received. The returner then switches from the running state back to the available state, and the caller (who performed the call that the returning to) switches from the waiting state back to the running state. The thread has migrated back to the caller.

4.3.Send Invocation

It is sometimes necessary to send messages without blocking for a reply. The send invocation does this. The caller sends a message as with the call instruction, and blocks until the recipient has received it. Once the message has been transmitted, both the sender and the receiver are in the running state, and are able to execute instructions.

As an example, the send invocation is used when the user interface wishes to initiate a new application.

Note that the send invocation has the effect of creating a new, independent thread of execution. The invoking thread remains in it's original process, and a new thread is allocated for the recipient.

5. Not Just Client Server

The call and return invocations are named for their most common usage, which is to handle client/server interactions between two process, but this is not the only way in which these invocations can be used. In particular, a call can be performed on a resume key, and a return can be performed on a start key. This has some very useful implications.

Subscribe to Posts | Subscribe to Comments

Seminar Sparkz Inc

Desire to Learn & Build Your Career

Extremely Reliable Operating System

Leave a Reply

Blog Archive

Followers

Admin

Total Pageviews