Home > cirrostratus

cirrostratus

Cirrostratus is a project mainly written in C, based on the GPL-2.0 license.

Cirrostratus storage

GG's AoE target

ggaoed is an AoE target implementation for Linux, distributed under the terms of the GNU GPL, version 2 or later. ggaoed has the following features:

  • A single process can handle any number of devices and any number of network interfaces
  • Uses kernel AIO to avoid blocking on I/O
  • Request merging: read/write requests for adjacent data blocks can be submitted as a single I/O request
  • Request batching: multiple I/O requests can be submitted with a single system call
  • Supports hotplugging/unplugging of network interfaces
  • Uses eventfd for receiving notifications about I/O completion
  • Uses epoll for handling event notifications
  • Uses memory mapped packets to lower system call overhead when receiving and sending data
  • Devices to export can be identified either by path or by UUID (using the libblkid library)
  • Delayed I/O submission utilizing timerfd (experimental)

Motivation

I wanted to play with Linux AIO, eventfd, memory mapped packets etc., and I needed a project that can utilize all of these. The de facto standard solution ("write yet another small web server") seemed boring, so ggaoed was born.

This also means ggaoed is not meant to be portable.

Requirements

The following software is needed in order to build ggaoed:

  • glibc 2.8 (built on Linux kernel 2.6.27 or later)
  • libaio 0.3.107
  • libatomic_ops 1.2
  • glib 2.12
  • libblkid-dev
  • docbook2x 0.8 (for building the man pages)

If you are using a recent enough major Linux distribution chances are high that you can find the above already packaged, just install the suitable devel packages.

Running ggaoed requires Linux kernel 2.6.27 or later. Using the memory mapped packet interface for sending data requires kernel 2.6.31 or later.

Building and installing

Just run

./configure
make
make install

You have to copy ggaoed.conf.dist to $(sysconfdir) and edit it according to your needs.

Performance issues

io_submit() can block if you're not using direct I/O and the required data is not in the page cache. That means that if you have one device using buffered I/O, that device may block the processing of requests of other devices. Using jumbo frames can reduce the impact if the client generally submits page aligned I/O requests.

Even when using direct I/O, mapping the offset in the request to physical location on the disk still happens synchronously. Example: if you're exporting a file rather than a block device, then the location of the file must be read from the disk synchronously before the requested I/O operation can be queued to execute asynchronously. LVM also has a negative impact on I/O submission latency but much smaller than a file system.

Performance wise the best is to export raw disk devices or partitions. The ease of administration provided by LVM may well justify the slight performance impact it brings. Export regular files only if you do not care about performance.

io_submit() can block if the kernel runs out of block layer request slots. You can set the limit in /sys/block//queue/nr_requests.

There is a system-wide limit on the number of AIO requests. You can set the limit in /proc/sys/fs/aio-max-nr.

Use jumbo frames if you want performance. The recommended MTU size is 9000 as this is the most common size supported by most gigabit network equipment. You can use larger MTU sizes but make sure all components (initiator, target, and any switches between them) support the MTU size you want to use.

Use ggaoectl to monitor the performance of the daemon. If there are dropped packets, increase the ring buffer size. A ring buffer that is too small can cause severe performance drop because the client will have to re-transmit often.

Use manageable switches and monitor packets dropped by the switch. If the target ggaoed has higher bandwidth than the initiators (i.e. the target is on a 10GigE link while the initiators has only 1GigE), then large queue sizes can cause excessive packet drops which in turn kill performance. Avoid ethernet flow control, you can get better performance by reducing the queue lengths.

About the memory mapped ring buffers

ggaoed uses a ring buffer mapped in memory to transfer packets to and from the kernel to reduce the number of system calls. There are two rings: one for receiving and one for sending data (the latter requires kernel version 2.6.31 or later). Each ring consists of a number of blocks, and every block contains one or more full frames (packets). ring-buffer-size in the config file tells the combined size of the two ring buffers.

The memory allocated for a single block must be physically contiguous. Allocating a large physycally contiguous memory block can be hard especially when the machine is running for some time and the memory gets fragmented. You can try to increase /proc/sys/vm/min_free_kbytes if fragmentation is a problem, but that is not a real solution either.

ggaoed uses a block size of 64 KiB by default, which means 7 jumbo frames fit inside a single block. If allocating the ring buffer fails, ggaoed tries smaller block sizes, but that means more memory will be wasted (since a block can contain only full frames) and less frames (packets) can be placed inside the ring buffer.

Here is a single table showing the relation between MTU, block size and the proportion of memory wasted:

MTU Block size  Frames/block    Wasted
----------------------------------------------
9000    64k     7        3.3%
9000    32k     3       17.1%
9000    16k     1       44.7%
4608    64k     14       0.6%
4608    32k     7        0.6%
4608    16k     3       14.8%
4608     8k     1       43.2%

Due to kernel internals, the number of blocks must not exceed the number of pointers that a single page can hold, that is, getpagesize() / sizeof(void *). Putting it together, the maximum size of a single ring buffer is 32 MiB on 64-bit architectures, and 64 MiB on 32-bit architectures, provided that ggaoed can use the default 64 KiB block size. If the block size has to be reduced to 32 KiB due to memory fregmentation, then the maximum ring buffer size is halved as well etc.

Acknowledgements

Some ideas have been incorporated from vblade (http://aoetools.sf.net) and qaoed (http://code.google.com/p/qaoed).

License

GPL v2 or later. See COPYING for the details.

Gábor Gombás [email protected]