KGI Acceleration Handling

This is the promised clarification about how KGI intends to handle accleration. I try to explain the concepts behind and not go into much detail.

Introduction

The KGI framework does not a-priori prescribe how applications may alter the display hardware state. Instead, several abstractions are provided to establish 'communication channels' between the display hardware and the application. This is done by a so-called "mapper" that maps KGI abstractions into the applications address space. Currently a mapper using a ioctl() and mmap() interface are worked on, but this is not the only one possible. The abstractions (in the subsequent text called resources) introduced so far include (but are not limited to)

commands
MMIO regions
accelerators
shared memory
...

and will be explained in more detail in the subsequent text. The display drivers have to implement a kind of 'back end' which does the hardware dependent operations to implement the resource. The mapper adds a frontend part, which contains all application specific state (e.g. rendering context, etc.) All resources have access restrictions associated with it (similar to UNIX file access rights) in that they may be exclusive (only the session leader == first process that claimed the hardware may use it) or shared (several processes may use it, but only with permission of the session leader).

Commands

Commands may be thought of as synchronous calls to a driver function, that is after the command invocation returns, the action specified is completed. Example: mode setting, setting of attribute lookup tables, etc. Command implementations may have high performance penalties (e.g. user<->kernel transition).

MMIO regions

Mappings of MMIO region resources give applications direct access to memory mapped I/O resources of the display hardware. Accessing these regions must not cause fatal errors (system lockup, etc.) in any case but may be undefined if other resources are used concurerently (e.g. if the accelerator is currently accessing the framebuffer). Examples are frame buffers, local buffers, texture memory, etc. Accelerator FIFOs or MMIO regions _may_ be exported, but usually are only to the session leader.

Several regions scattered in physcial address space may be presented to the application as a virtually continous region. However, all regions must have the same size and are guarantueed not to be accessed simultaneously.

MMIO regions have no application dependent state except the last/current offset of the region if several discontinous regions are part of this resource.

Accelerators

Accelerators are means to execute command buffers filled by the application. They have a driver/protocol dependent application context associated with each mapping. When establishing the mapping, the mapper allocates a number of buffers and a context buffer meeting the preferences of the driver (e.g. do the buffers/the context buffer have to be suited for DMA access?). It then maps one buffer exclusively into the applications address space and waits for the application to fill this buffer. Once this is done, the application will initiate "execution" of that buffer, whereby the application looses access to that buffer, eventually caches are flushed (all data is written to memory), and the buffer is handed to the driver. The driver may choose to execute the buffer synchronously (buffer is executed if the drivers execute() function returns) or asynchronously (the buffer may still 'belong' to the driver).

In any case, the driver has to be aware that executing a buffer may require a context switch and perform this one as neccessary. Also, if asynchronous execution is implemented, scheduling issues are left to the driver.

There may be up to 16 buffers of arbitrary but same size per mapping. The number of mappings is only limited by process limits.

Shared memory

Shared memory implements access to a common region of memory for both the application and the driver/hardware. The shared memory may be attached to a context of an accelerator mapping.

Current implementation

The current implementation of a mapper providing access to the resources via standard file operations can be found in file:kgi-0.9/kgi/Linux/linux/drivers/kgi/graphic.c.

Command resources are mapped using ioctl()
MMIO and accelerator resources are mapped using mmap() Both are implemented by a _domap() function (establish the mapping, allocate application context) and a _no_page() function.

For MMIO regions, the _no_page() function determines the offset of the physical region to be mapped, asks the hardware to prepare itself and invalidates mappings of another region and validates the mapping for the accessed region.

For accelerator mappings, the _no_page() function determines what part of the the buffer is to be executed (depending on the offset into the next buffer), invalidates the mapping and passes the buffer to the driver for execution. After the buffer is scheduled for execution or is executed, it determines the next buffer from a circular list of buffers, makes sure it is executed (by testing the buffer state, if it is not idle -- thus being or waiting to be executed -- it waits until the buffer is executed). Done that it validates the mapping into the application address space and waits until the application has filled that buffer.