[Previous] [Contents] [Index] [Next]

io-blk.so

Block I/O support

Syntax:

driver [blk option[,option...]] [fstype [options]]

Options:

The driver is one of the devb-* drivers, such as devb-eide, and option is one of the options described below.

The optional fstype argument is one of the filesystem drivers (fs-*), and you can follow it with options specific to the filesystem.

Suffixes for memory sizes

You can specify the memory sizes used by io-blk.so with any of the following suffixes:

blk options

You can specify the following options only in the blk section:

alloc=initial
Allocate initial cache memory when the driver starts. The initial argument can include the suffixes listed above. If this option isn't specified, the amount given by the cache= option is allocated.
auto=amount
Set the amount of automounting to be performed; amount is one of:

The default is partition.

automount=dev[:mountpoint[:fstype[:options]]]
automount=@filename
Create a mountpoint for dev at mountpoint; for example, automount=hd0t77:/disk mounts /dev/hd0t77 at /disk.

The optional fstype specifies the filesystem type, after which you can set options. The choices of filesystem and the associated shared objects are:

cd
fs-cd.so
dos
fs-dos.so
ext2
fs-ext2.so
qnx4
fs-qnx4.so

If not specified, the library tries to determine the filesystem automatically.

If the @filename version of this option is used, the automounts are as specified in the given file. The file is a list of mounts (using the same syntax as above), separated by newline characters or commas.


Note: You can't locate the filename file in the filesystem to be automounted: it has to be available in an existing filesystem such as the image filesystem. Optionally, you could locate it in any devb filesystem that is already running.

To mount multiple filesystems on a (removable) device, specify that the device is shared with a + prefix. For example,

automount=+fd0:/dos/a:dos,automount=+fd0:/fd:qnx4
        

For a list of common partition types, see the Filesystems chapter of the System Architecture guide.

bufsz=min:max
The size, in bytes, of the smallest and largest physical sector. The default is 512:16K.
cache=total
The total in-memory cache size allowed. Cache memory is allocated as necessary beyond the initial amount specified by the alloc= option until the total size is reached. The memory size uses the suffixes described above; the default is 15% of system RAM.
delwri=delay
Specify the delay time for delayed writes. A dirty disk block remains in the cache without being physically written to the disk for up to delay seconds. The default is 2 seconds. For more information, see "Controlling writing operations," below.
devdir=path
The directory in which io-blk presents the physical devices as block-special files. The default is /dev.
devno=type
Controls how major device numbers are requested; type is one of:
fdinfo=mode
Specify the storing of open file names for the iofdinfo() query. The options for mode are:
ncache
The default. Try to reconstruct the file name from the contents of the directory name cache. Don't rely on this option to supply the names of all open files (a file's name is supplied only if all components of its pathname are in the name cache).
always
Store the name used in each open() call to ensure that this name is always available.
never
Never supply the name of an open file.
hash=size
Set the number of entries in the buffer cache hash list. If this option isn't specified, the default is the value of the cache option.
map=size
Set the number of entries in a cache used to map translations from logical blocks to physical ones. If this option isn't specified, the size is based in the value of the vnode option.
naming=scheme
Set the device/partition naming scheme. The default is 0#.
ncache=size
Specify a name cache of size entries. Using more name cache entries speeds up path/file lookups at the expense of memory. Setting the size to 0 disables name caching. If this option isn't specified, the size is determined from the vnode option.
noaiod
Disable asynchronous iodone processing, doing the processing instead in the context of the driver thread. By default, this is handled by a dedicated thread.
nora
Disable read-ahead. If read-ahead is enabled (the default), sequential file access patterns are noted and cause a preread of the blocks most likely to be requested next.
postpone=time
Keep a dirty disk block in memory for time seconds if it's being continuously modified before physically writing it to the disk. Setting the postpone= option ensures that the continuously modified block gets written to disk periodically (every time seconds). The default is determined from the delwri option.
protect=number
Set the number of protected extra LRU passes. The default is 2.
ramdisk=size
Create an internal ramdisk device (/dev/ramX) of the specified size. The size variable can use the suffixes described above. The initial contents of this memory device are unspecified, so it must be formatted before use as a filesystem (see dinit in the Utilities Reference).
rmvpoll=period
The polling period, in seconds, for removable media (default: 0).
rmvto=delay
Specify a removable media timeout of delay seconds (default: 2 seconds). After delay seconds of inactivity, a disk access prompts validation of the media with the driver; if it reports that the media has been changed, all data blocks and cached information for that device are discarded and relearned.
thread=[max]:[low]:[high]
Set the thread pool parameters (maximum, low water, and high water). The default is 12:2:4.
verbose[=level]
Display verbose output.
vnode=size
Specify the number of vnode entries (default: 1280 entries). Up to size vnodes may be active. Vnodes remain in this cache when the corresponding file is closed, making subsequent opens faster.
wipe=size
Set the amount of cache that may be occupied by a single file. This option is used to prevent the "cache wiping" phenomenon, where reading a large file may flush a large proportion of buffer cache. The size can use the suffixes described above; the default is 100% (i.e. no limit is enforced).

Filesystem options

You can apply the following options globally (in the blk section) or to a specific filesystem (for example, in the qnx4 section for a QNX 4 filesystem):

commit=level
Set the committing level of the filesystem, which controls how dirty system/user blocks are written to disk. The level is one of none, low, medium (the default), and high. If it's none, all writes are time-delayed (as specified by the delwri option); at high, all writes are performed synchronously. For more information, see "Controlling writing operations," below.
error=action
Set the action to perform when a fs-* filesystem module detects an internal error. The action is one of:

The default is ebadfsys.

[no]atime
Update/don't update the file's directory entry if the only change is the access time. The noatime option isn't strict POSIX 1003.2 behavior, but it's faster.
[no]creat
Allow/don't allow files to be created on this filesystem.
[no]exec
Allow/don't allow file execution from this filesystem.
[no]lock
Lock/don't lock removable media. If locked, the medium is treated as fixed.
[no]suid
Ignore/don't ignore the set-user ID bit on files in this filesystem.
ro
Mount all drives/filesystems as read-only.
rw
Mount all drives/filesystems as read-write (if the physical media permit).

Description:

The io-blk.so library provides block I/O support, as used by the devb-* drivers, and loads filesystem drivers (fs-*) as necessary.

The default values of the map and ncache options are based on the value of the vnode option, and the default value of the hash option depends on the cache option. This arrangement lets you configure a system by specifying the cache size and the number of files, and letting the library set the other options.

Controlling writing operations

There are various types of writing operations:

Synchronous
Start immediately and wait for completion.
Asynchronous
Start immediately but don't wait for completion.
Delayed
Don't start until after a timeout period and then perform as asynchronous.
As required
Write only if you have to.

The blk delwri= option controls the timeout for the delayed format.

The types of data include:

User
What you read() and write().
Metadata
Things associated with stat(), such as times and IDs.
Filesystem
Things such as bitmaps, extents, etc.

If a file has no links, the "as required" form of write operation is used, never going to disk unless the buffer or cache is needed (since the file has no links, the data isn't expected to be accessable after a power failure). If you open a file with O_SYNC, the synchronous format is always used.

Otherwise, the blk commit= level controls the type of write to use for each level of data. For none, everything is delayed; for high, everything is written synchronously. The low and medium are very similar, except for metadata, which is written asynchronously or synchronously.


Caution: If you specify commit=none, you lose all write ordering (both for single multiblock updates and multiple-user operations). Hence, your chances of a useful recovery following a power failure are poor. We recommend that you use this option only if you have a uninterruptible power supply (UPS), or if you don't mind using dinit on your filesystem as a recovery tool.

Calling close() might force a metadata update, but does nothing to the user data. Calling fsync() always forces out any delayed-write blocks for the file, and so is useful only when commit isn't high.

See also:

"Block-oriented drivers (devb-*)" and "Filesystem drivers (fs-*)" in the Utilities Summary

Filesystems chapter of the System Architecture guide

Working with Filesystems and Fine-Tuning Your System chapters of the User's Guide


[Previous] [Contents] [Index] [Next]