KGI Display Hardware Driver Overview
Abstract
As a part of the GGI (General Graphics Interface) Project, a Kernel Graphics
Interface (KGI) is being developed to provide the neccessary hardware 
abstraction to allow efficient sharing and virtualization of graphics
hardware in multi-user/multi-processing environments.
This article is intended to give a detailed overview of the KGI 
portability layer, display hardware abstraction and a basic overview
of the modular display driver.
General Overview
The main design goals for KGI (Kernel Graphics Interface) display drivers 
can be summarized as:
- 	Portability. KGI display drivers should easily be reused in
	different environments - such as in-kernel drivers or drivers that
	are part of a user-space application - without any modifications 
	to the driver sources. Also, the display drivers and display 
	hardware model should not prescibe a certain programming interface.
	
- 	Flexibility. The KGI display driver model should be flexible
	enough to be used for any type of display hardware, as well as
	easily extendible to new developments.
- 	Performance. The display driver design should allow for 
	efficient use of acceleration features, especially in multi-user
	multi-process(or) environments. This includes means to share
	and virtualize graphics hardware.
In order to meet these goals, KGI-0.9 is divided into the following key
components:
- 	a portability layer that defines basic types and physical
	I/O services used by the drivers to access the hardware.
- 	an abstract display hardware model that allows a hardware
	independent description of operation modes.
- 	a modular display hardware driver design consting of a 
	'low-level' part that may be run in kernel space and a 'high-level'
	part translating a given application programming interface
	request into hardware-specific low-level requests. These are
	handled either directly by the hardware or passed to the
	low-level driver for execution.
- 	a KGI environment that provides the neccessary environment and
	operating system services to share and virtualize the application
	views of the hardware.
Each of the key components except the environment services mentioned 
will be explained in more detail in the following sections.
Portability layer
The KGI portability layer defines some basic data types, some host specific
macros and definitions to handle endianess and physical I/O services
in a platform-independent manner.
- Integral (integer) typesAll signed integral types are defined to use 2's complement 
	representation, the most significant bit being the sign bit.- 
	| kgi_s8_t | 8bit signed |  | kgi_u8_t | 8bit unsigned |  |  | kgi_s16_t | 16bit signed |  |  | kgi_u16_t | 16bit unsigned |  | kgi_s32_t | 32bit signed |  | kgi_u32_t | 32bit unsigned |  | kgi_u_t | system native unsigned integral, but at least 32bit wide |  | kgi_s_t | system native signed integral, but at least 32bit wide |  | kgi_ascii_t | 8bit character code with 8bit ISO-latin1 encoding |  | kgi_unicode_t | 16bit character code with 16bit UNICODE encoding |  | kgi_isochar_t | 32bit character code with 32bit ISO 10646 encoding |  | kgi_virt_addr_t | virtual address type (byte-offset arithmetic) |  | kgi_phys_addr_t | physical address type (byte-offset arithmetic) |  | kgi_bus_addr_t | bus address type (byte offset arithmetic) |  | kgi_size_t | type to encode address range sizes. |  | void | indicates no associated type information |  | void * | same as kgi_virt_addr, but no arithmetic defined |  | kgi_private_t | data type to hold any of the above types. |  
 The low-level KGI display hardware drivers have to run in different
	environments, e.g. as in-kernel drivers or as library extentions.
	The use of instructions that modify floating point registers is 
	therefore not allowed for low-level drivers and the corresponding
	are not defined in KGI. However, high-level drivers that translate
	a given API (e.g. OpenGL) to hardware specific commands are defined
	to run as part of a application and may utilize the full
	register/instruction set available.
- EndiannessKGI assumes all data types to be stored in driver accessible virtual
	memory to be either in host-native or explicitly in big or little
	endian encoding. The KGI system layer defines a set of macros
	to convert between host-native endian (HE) and big endian (BE) or litle 
	endian (LE) encoded data. The macros are named 
	sysencodingtype(arg), where 
	encoding is either LE or BE, 
	and type is one of the following: 
	isochar, unicode, s16, u16, 
	s32 or u32.
	If the argument is in HE encoding, the result will be in BE or LE
	encoding and vice versa. Note that these are macros and therefore
	the argument passed should either be a constant expression or a
	direct variable. Expressions that contain function calls or
	assignment operations must not be used as arguments for these macros.
- Physical I/OKGI low-level drivers are the primary instance that coordinates
	graphics hardware access. Some resources of the graphics hardware
	(texture buffers, frame buffer I/O memory, DMA buffers, FIFO registers
	etc.) may be exported to applications, but this is not done without
	approval by the low-level driver. The low-level driver therefore
	has to register _all_ resources (interrupts, I/O memory regions, etc.)
	required to operate the card with the Operating System environment.- 
	KGI uses the concept of I/O regions to handle resources required by
	drivers. Basically, an I/O region is an address space and a set
	of operations defined on this address space.
	For a given I/O type io, the associated metalanguage
	is defined as follows:
	 - 
	- io_paddr_t
 physical address - needed to establish mapping to
		virtual addresses
- io_iaddr_t
 i/o address type - addresses the device will respond to
		when applied on the address select lines
- io_baddr_t
 bus address type - the address other devices have
		to access on their bus to access this device
- io_vaddr_t
 virtual addresses - only these may be used with the
		subsequent programmed I/O functions (kind of a handle)
- struct _region_s io_region_t
 a structure that is used to communicate information
		about a given region between the driver and the
		environment. The following fields are defined:
		| device | a handle that uniquely identifies the location 
			in the device tree |  
		 | base_virt | virtual address that maps to the device's
			base address |  
		 | base_io | io base address of the region the device
			responds to |  
		 | base_bus | bus address to be used to access this 
			address |  
		 | base_phys | physical address to be used to establish
			a virtual mapping |  
		 | size | size (in bytes) of the region |  
		 | decode | bitmask of address select lines the decoder
			evaluates |  
		 | name | a string that identifies the region |  
		 
 
- int io_check_region(io_region_t *)
 This environment function queries if the given region is
		'free', e.g. not served by another driver. Not all environments
		provide sufficient support for this to be implemented. If
		this function cannot be implemented properly, it should
		always indicate a region is 'free'. The device, base_io, size,
		decode and name fields of the region passed have to be properly
		initialized.
- io_vaddr io_claim_region(io_region_t *)
 This environment function registers the region - if possible -
		with a central resource management facility and establishes 
		a virtual mapping of this region. Before claiming a region,
		the driver has to check whether a region is free.
		The region passed need to have the same fields valid
		as for io_check_region(). After completion, all fields
		are initialized with valid values.
- io_vaddr_t io_free_region(io_region_t *)
 This environment function destroys a virtual mapping
		established by io_claim_region() and unregisters with
		a central resource management facility. This invalidates
		the base_ fields of the region passed, except for
		the base_io field. Note that the driver must not assume
		a valid virtual/bus mapping after freeing a region.
- kgi_usize io_insize(const io_vaddr_t vaddr)
 Returns the result of a read operation of size 
		size bits at the device address mapped to 
		vaddr (base_virt + offset corresponds to 
		base_io + offset for a given region).
		vaddr has to be naturally aligned on a size bit 
		boundary.
- void io_outsize(const kgi_usize_t val, const io_vaddr_t addr)
 Performs a write operation of size bits 
		width at the device address mapped to vaddr. The same 
		alignment restrictions as for io_insize()
		apply.
- void io_inssize(const io_vaddr_t vaddr, void *buf, kgi_size_t count)
- void io_outssize(const io_vaddr_t vaddr, const void *buf, kgi_size_t count)
 Performs count read/write operations of 
		size bit size at the device address 
		mapped to vaddr reading/writing the data from the 
		device to buf/from buf to the device.
		vaddr has to be properly aligned and buf must be 
		valid.
- void io_putsize(const io_vaddr_t vaddr, void *buf, kgi_size_t count)
- void io_get(void *buf, const io_vaddr_t, src, unsigned long count)
 Performs count write/read operations of size 
		sizebit size at the device address mapped to 
		vaddr. The difference to io_ins/outs()
		is that vaddr is incremented according
		to size after each write.
 - 
	Note that for a particular bus/io space binding any of the I/O 
	operations that are not supported may be missing. Currently the
	following bindings are definied:
	 - 
	| pcicfg | PCI32 Configuration Registers |  | io | ISA I/O-Ports |  | mem | Memory Mapped I/O |  
 
So basically the KGI portability layer defines platform independent data types
and means how to establish a communication channel between the hardware
and the driver.
Detailed data type definitions can be found in 
file:kgi-0.9/kgi/include/kgi/io.h
Display Hardware Model
KGI employs a operation mode description independent of the underlying
hardware and desired application programming interface. This is used to
specify the operation mode of a given hardware without assumptions
specific to a given API. The concept behind this description is to describe
the data flow from a device-internal frame buffer representation to
the final visible image.
- Attributes
 The KGI display hardware model assumes graphics hardware to be used
	to control a visible rectangular picture in certain attributes. The 
	smallest units for which attributes can be controlled independently
	of each other are picture elements (pixels). However, a change of a
	pixel's attribute (e.g. the character displayed in this pixel) may 
	result in a change of smaller units of the visible image called dots.
	Currently the following attributes are defined:
	| private | driver private data |  | application | store what you want here, the hardware doesn't 
		care |  | stencil | stencil mask/window ID values |  | z | z-buffer value |  | colorindex | color (the final color is determined by a table 
		lookup) |  | color1 | direct control of color channel 1 |  | color2 | direct control of color channel 2 |  | color3 | direct control of color channel 3 |  | alpha | alpha value |  | foreground | foreground color index for text modes |  | texture index | pixel texture (character shape) index for text 
		modes |  | blink | blink bit/frequency |  
 
	The particular meaning of color1, color2, 
	and color3 depends on the viewing device and is specified 
	by the color-space (YUV, RGB, ...) associated with it.
	 
	Some display hardware allows to control the attributes of 
	two pictures (with identical resoulution) independently, so 
	that stereo viewing is possible. To allow for smooth animation,
	several versions (frames) of a picture may be stored in the 
	device to allow fast changes between the versions.
	 
	KGI therefore further divides per-pixel attributes into attributes 
	stored per frame and attributes stored common to all frames. 
	If a display hardware is stereo-capable, all per-frame (e.g. 
	color, alpha values) attributes can be controlled independently 
	for the left and right image.
	Common-to-all-frames attributes (e.g. z-values, stencil values)
	are global to all frames, for both the left and right image 
	(if applicable).
	 
	In order to represent precision requirements in the per-attribute 
	control, a bitmask and a zero-terminated array of kgi_u8_t
	values specifying the number of bits required per attribute is used.
	This allows for a compact, sufficient and extensible representation 
	of all frame buffer formats.
	For example, a typical 3D application would specify
	KGI_AM_ALPHA|KGI_AM_COLOR_INDEX
	and { 8, 8, 0 } for the per-frame attributes
	and KGI_AM_STENCIL | KGI_AM_Z
	and { 8, 24, 0 } for the common-to-all-frames
	attributes. 
- Image Modes, Dot Ports, Dot Streams and Dot Stream Converters
 
	The final, visible picture may be the result of (digital or analog)
	signal processing, e.g. blending, overlaying or chroma-keying of several
	independent images.
	 
	Given a display hardware internal 2D buffer of a particular size (the
	virtual image), only a rectangular subregion of that virtual image
	(the visible image) may be used for the overlay.
	 
	KGI-0.9 uses an abstract representation of the signal sources and
	signal processing devices to describe the hardware operation mode.
	 
	 
	- Image Modes
 describe which attributes are stored per frame and common 
		to all frames, at what precision attributes are stored and
		what size the virtual and visible image are (in pixels), 
		as well as some global properties, e.g. if the 
		virtual/visible image can be resized, if scaling/interpolation
		or table-lookup operations can be applied to per-pixel 
		attribute values before being converted into dots
		and sent to a dot-port.
- Dot Ports
 describe what final screen size (in dots), color space, 
		data format etc. the dot-data transfered from a 
		image-read-out-unig to another signal processing device has.
		The signal processing device is assumed to process the data 
		at a certain maximum rate, the dot clock.
		E.g. a video DAC may change it's RGB outputs once per dot 
		clock cycle.
		However, data may have to be transfered at a higher or lower 
		rate wich is determined by the load clock ratio, defining 
		the dot-data transfered per transfer cycle.
- Dot Stream Converters
 represent signal processing devices that read image data 
		on one or more dot ports, optionally perform some 
		operations (color space conversion, interpolation, 
		dot-rate conversion, overlaying, etc.)
		and send the result to another dot-port.
 
	This abstraction allows very complex hardware setups to be described
	in a kind of signal-flow-tree, with a dot-port as root representing the
	viewing device, dot-stream-converters as nodes, dot-ports as links and
	image modes as leafs. 
- Resources
 The abstraction described in the last section allows to describe the
	(static) operation mode and frame/common buffer requirements.
	However, it does not specify means to _alter_ (dynamic) properties
	of the operation mode (e.g. the look-up-tables) or the frame/common
	buffer contents.
	 
	This is done through resources, some of which are global and must be 
	shared between processes (e.g. the frame/common buffers, look-up 
	tables) and some of which can be virtualized (e.g. texture buffers,
	2D or 3D graphics processor, etc.)
	 
	Basically resources are data structures used to communicate relevant
	data to an external mapper (a special device file driver), that 
	utilizes the neccessary protection/virtualization mechanisms of 
	the environment.
	Depending on the environment some resources (e.g. accelerators - 
	see below) may not be available to the high-level driver(s).
	Currently the following resources are defined:
	 
	 
	| Commands | This resource is used to perform specific requests, e.g.
		setting a look-up-table entry etc. |  | MMIO regions (memory mapped I/O regions) | This resource type is used to allow processes to get
		a virtual mapping of device-local memory, such as frame or 
		local buffers, graphics processor control registers, 
		etc. |  | Accelerators (DMA buffers)/Streams | This resource type is used to establish access to a
		circular list of process-local DMA buffers (only one 
		at a time being writeable to the application).
		The buffers are allocated by the external mapper and
		are phyiscally continous. |  | Shared (virtual) Memory (AGP texture memory) | This resource type is used to establish access to
		a memory object shared between the low-level driver,
		hardware and the application. This is not yet specified
		in detail. |  
 Exact definitions of the various types can be found in 
	file:kgi/include/kgi/kgi.h
	 
Modular Display Driver Implementation
	The most common graphics card architecture on the PC-market 
	utilizes the following principal design:
	
![Hardware Diagram [11kB JPEG]](hardware.jpeg) KGI therefore defines a modular driver architecture that allows 
	to write and distribute separate drivers for each subsystem 
	(except memory).
	A fully operational driver is then obtained by linking the 
	sub-system drivers together.
	KGI therefore defines a modular driver architecture that allows 
	to write and distribute separate drivers for each subsystem 
	(except memory).
	A fully operational driver is then obtained by linking the 
	sub-system drivers together.
	Each driver provides some (specified) driver-global information,
	such as maximum resolution, vendor and model, AC limits etc.
	
A meta-language is defined for each subsystem that allows driver 
	initialization, deinitialization, resource export and operation 
	mode negotiation/checking. This way drivers can be passed a partially
	filled-in operation mode description and auto-negotiate the proper
	operation mode.
	
This modular display driver internal interface is defined in 
	kgi-0.9/drv/display/kgi/module.h,
	but is adopted to allow an easier
	mapping to the UDI driver model and not yet finalized.
Summary
	This article was intended to give a more detailed view of the 
	KGI display hardware abstraction model.
	It mainly covered (static) operation mode specification, as well 
	as application/driver/hardware interaction.