C H A P T E R  24

Striping and I/O Options

This chapter describes file striping and I/O options, and includes the following sections:


24.1 File Striping

Lustre stores files of one or more objects on OSTs. When a file is comprised of more than one object, Lustre stripes the file data across them in a round-robin fashion. Users can configure the number of stripes, the size of each stripe, and the servers that are used.

One of the most frequently-asked Lustre questions is “How should I stripe my files, and what is a good default?” The short answer is that it depends on your needs. A good rule of thumb is to stripe over as few objects as will meet those needs and no more.

24.1.1 Advantages of Striping

There are two reasons to create files of multiple stripes: bandwidth and size.

24.1.1.1 Bandwidth

There are many applications which require high-bandwidth access to a single file - more bandwidth than can be provided by a single OSS. For example, scientific applications which write to a single file from hundreds of nodes or a binary executable which is loaded by many nodes when an application starts.

In cases like these, stripe your file over as many OSSs as it takes to achieve the required peak aggregate bandwidth for that file. In our experience, the requirement is “as quickly as possible,” which usually means all OSSs.



Note - This assumes that your application is using enough client nodes, and can read/write data fast enough to take advantage of this much OSS bandwidth. The largest useful stripe count is bounded by the I/O rate of your clients/jobs divided by the performance per OSS.


24.1.1.2 Size

The second reason to stripe is when a single OST does not have enough free space to hold the entire file.

There is never an exact, one-to-one mapping between clients and OSTs. Lustre uses a round-robin algorithm for OST stripe selection until free space on OSTs differ by more than 20%. However, depending on actual file sizes, some stripes may be mostly empty, while others are more full. For a more detailed description of stripe assignments, see Managing Free Space.

After every ostcount+1 objects, Lustre skips an OST. This causes Lustre’s "starting point" to precess around, eliminating some degenerated cases where applications that create very regular file layouts (striping patterns) would have preferentially used a particular OST in the sequence.

24.1.2 Disadvantages of Striping

There are two disadvantages to striping which should deter you from choosing a default policy that stripes over all OSTs unless you really need it: increased overhead and increased risk.

24.1.2.1 Increased Overhead

Increased overhead comes in the form of extra network operations during common operations such as stat and unlink, and more locks. Even when these operations are performed in parallel, there is a big difference between doing 1 network operation and 100 operations.

Increased overhead also comes in the form of server contention. Consider a cluster with 100 clients and 100 OSSs, each with one OST. If each file has exactly one object and the load is distributed evenly, there is no contention and the disks on each server can manage sequential I/O. If each file has 100 objects, then the clients all compete with one another for the attention of the servers, and the disks on each node seek in 100 different directions. In this case, there is needless contention.

24.1.2.2 Increased Risk

Increased risk is evident when you consider the example of striping each file across all servers. In this case, if any one OSS catches on-fire, a small part of every file is lost. By comparison, if each file has exactly one stripe, you lose fewer files, but you lose them in their entirety. Most users would rather lose some of their files entirely than all of their files partially.

24.1.3 Stripe Size

Choosing a stripe size is a small balancing act, but there are reasonable defaults. The stripe size must be a multiple of the page size. For safety, Lustre’s tools enforce a multiple of 64 KB (the maximum page size on ia64 and PPC64 nodes), so users on platforms with smaller pages do not accidentally create files which might cause problems for ia64 clients.

Although you can create files with a stripe size of 64 KB, this is a poor choice. Practically, the smallest recommended stripe size is 512 KB because Lustre sends 1 MB chunks over the network. This is a good amount of data to transfer at one time. Choosing a smaller stripe size may hinder the batching.

Generally, a good stripe size for sequential I/O using high-speed networks is between 1 MB and 4 MB. Stripe sizes larger than 4 MB do not parallelize as effectively because Lustre tries to keep the amount of dirty cached data below 32 MB per server (with the default configuration).

Writes which cross an object boundary are slightly less efficient than writes which go entirely to one server. Depending on your application's write patterns, you can assist it by choosing a stripe size with that in mind. If the file is written in a very consistent and aligned way, make the stripe size a multiple of the write() size.

The choice of stripe size has no effect on a single-stripe file.


24.2 Displaying Files and Directories with lfs getstripe

Use lfs to print the index and UUID for each OST in the file system, along with the OST index and object ID for each stripe in the file. For directories, the default settings for files created in that directory are printed.

lfs getstripe <filename>

Use lfs find to inspect an entire tree of files.

lfs find [--recursive | -r] <file or directory> ...

If a process creates a file, use the lfs getstripe command to determine which OST(s) the file resides on.

Using ‘cat’ as an example, run:

$ cat > foo

In another terminal, run:

$ lfs getstripe /barn/users/jacob/tmp/foo
OBDS

You can also use ls -l /proc/<pid>/fd/ to find open files using Lustre, run:

$ lfs getstripe $(readlink /proc/$(pidof cat)/fd/1)

OBDS:

0: databarn-ost1_UUID ACTIVE
1: databarn-ost2_UUID ACTIVE
2: databarn-ost3_UUID ACTIVE
3: databarn-ost4_UUID ACTIVE
/barn/users/jacob/tmp/foo
	obdidx			objid				objid				group
	2			835487				0xcbf9f				0

This shows that the file lives on obdidx 2, which is databarn-ost3. To see which node is serving that OST, run:

$ cat /proc/fs/lustre/osc/*databarn-ost3*/ost_conn_uuid
NID_oss1.databarn.87k.net_UUID

The above condition/operation also works with connections to the MDS. For that, replace osc with mdc and ost with mds in the above commands.


24.3 lfs setstripe - Setting File Layouts

Use the lfs setstripe command to create new files with a specific file layout (stripe pattern) configuration.

lfs setstripe [--size|-s stripe-size] [--count|-c stripe-cnt] 
[--index|-i start-ost] <filename|dirname>

stripe-size

Stripe size is how much data to write to one OST before moving to the next OST. The default stripe-size is 1 MB, and passing a stripe-size of 0 causes the default stripe size to be used. Otherwise, the stripe-size must be a multiple of 64 KB.

stripe-count

Stripe count is how many OSTs to use. The default stripe-count is 1, and passing a stripe-count of 0 causes the default stripe count to be used. A stripe-count of -1 means always stripe over all OSTs.

start-ost

Start ost is the first OST to which files are written. The default start-ost is -1, and passing a start-ost of -1 allows the MDS to choose the starting index. This setting is strongly recommended, as it allows space and load balancing to be done by the MDS as needed. Otherwise, the file starts on the specified OST index, starting at zero (0).



Note - If you pass a start-ost of 0 and a stripe-count of 1, all files are written to OST #0, until space is exhausted. This is probably not what you meant to do. If you only want to adjust the stripe-count and keep the other parameters at their default settings, do not specify any of the other parameters:

lfs setstripe -c <stripe-count> <file>


24.3.1 Changing Striping for a Subdirectory

In a directory, the lfs setstripe command sets a default striping configuration for files created in the directory. The usage is the same as lfs setstripe for a regular file, except that the directory must exist prior to setting the default striping configuration. If a file is created in a directory with a default stripe configuration (without otherwise specifying striping), Lustre uses those striping parameters instead of the file system default for the new file.

To change the striping pattern (file layout) for a sub-directory, create a directory with desired file layout as described above. Sub-directories inherit the file layout of the root/parent directory.



Note - Striping of new files and sub-directories is done per the striping parameter settings of the root directory. Once you set striping on the root directory, then, by default, it applies to any new child directories created in that root directory (unless they have their own striping settings).


24.3.2 Using a Specific Striping Pattern/File Layout for a Single File

To use a specific striping pattern (file layout) for a specific file:

lfs setstripe creates a file with a given stripe pattern (file layout)

lfs setstripe fails if the file already exists

24.3.3 Creating a File on a Specific OST

You can use lfs setstripe to create a file on a specific OST. In the following example, the file "bob" will be created on the first OST (id 0).

$ lfs setstripe --count 1 --index 0 bob
$ dd if=/dev/zero of=bob count=1 bs=100M
1+0 records in
1+0 records out
$ lfs getstripe bob

OBDS:

0: home-OST0000_UUID ACTIVE
[...]
bob
	obdidx				objid				objid					group
	0				33459243				0x1fe8c2b					0

 


24.4 Managing Free Space

In Lustre 1.6, the MDT assigns file stripes to OSTs based on location (which OSS) and size considerations (free space) to optimize file system performance. Emptier OSTs are preferentially selected for stripes, and stripes are preferentially spread out between OSSs to increase network bandwidth utilization. The weighting factor between these two optimizations is user-adjustable.

24.4.1 Checking File System Free Space

Free space is an important consideration in assigning file stripes. The lfs df command shows available disk space on the mounted Lustre file system and space consumption per OST. If multiple Lustre file systems are mounted, a path may be specified, but is not required.


Option

Description

-h

Human-readable print sizes in human readable format (for example: 1K, 234M, 5G).

-i, --inodes

Lists inodes instead of block usage.




Note - The df -i and lfs df -i commands show the minimum number of inodes that can be created in the file system. Depending on the configuration, it may be possible to create more inodes than initially reported by df -i. Later, df -i operations will show the current, estimated free inode count.

If the underlying file system has fewer free blocks than inodes, then the total inode count for the file system reports only as many inodes as there are free blocks. This is done because Lustre may need to store an external attribute for each new inode, and it is better to report a free inode count that is the guaranteed, minimum number of inodes that can be created.


Examples

[lin-cli1] $ lfs df
UUID			1K-blockS			Used			Available			Use%		Mounted on
mds-lustre-0_UUID			9174328			1020024			8154304			11%	/mnt/lustre[MDT:0]
ost-lustre-0_UUID			94181368			56330708			37850660			59%	/mnt/lustre[OST:0]
ost-lustre-1_UUID			94181368			56385748			37795620			59%	/mnt/lustre[OST:1]
ost-lustre-2_UUID			94181368			54352012			39829356			57%	/mnt/lustre[OST:2]
filesystem summary:			282544104			167068468			39829356			57%	/mnt/lustre
 
[lin-cli1] $ lfs df -h
UUID				bytes		Used		Available			Use%			Mounted on
mds-lustre-0_UUID				8.7G		996.1M		7.8G			11%		/mnt/lustre[MDT:0]
ost-lustre-0_UUID				89.8G		53.7G		36.1G			59%		/mnt/lustre[OST:0]
ost-lustre-1_UUID				89.8G		53.8G		36.0G			59%		/mnt/lustre[OST:1]
ost-lustre-2_UUID				89.8G		51.8G		38.0G			57%		/mnt/lustre[OST:2]
filesystem summary:				269.5G		159.3G		110.1G			59%		/mnt/lustre
 
[lin-cli1] $ lfs df -i 
UUID				Inodes			IUsed		IFree			IUse%		Mounted on
mds-lustre-0_UUID				2211572			41924		2169648			1%	/mnt/lustre[MDT:0]
ost-lustre-0_UUID				737280			12183		725097			1%	/mnt/lustre[OST:0]
ost-lustre-1_UUID				737280			12232		725048			1%	/mnt/lustre[OST:1]
ost-lustre-2_UUID				737280			12214		725066			1%	/mnt/lustre[OST:2]
filesystem summary:				2211572			41924		2169648			1%	/mnt/lustre[OST:2]

24.4.2 Using Stripe Allocations

There are two stripe allocation methods, round-robin and weighted. The allocation method is determined by the amount of free-space imbalance on the OSTs. The weighted allocator is used when any two OSTs are imbalanced by more than 20%. Until then, a faster round-robin allocator is used. (The round-robin order maximizes network balancing.)

24.4.3 Round-Robin Allocator

When OSTs have approximately the same amount of free space (within 20%), an efficient round-robin allocator is used. The round-robin allocator alternates stripes between OSTs on different OSSs, so the OST used for stripe 0 of each file is evenly distributed among OSTs, regardless of the stripe count. Here are several sample round-robin stripe orders (the same letter represents the different OSTs on a single OSS):


3: AAA

one 3-OST OSS

3x3: ABABAB

two 3-OST OSSs

3x4: BBABABA

one 3-OST OSS (A) and one 4-OST OSS (B)

3x5: BBABBABA

 

3x5x1: BBABABABC

 

3x5x2: BABABCBABC

 

4x6x2: BABABCBABABC

 


24.4.4 Weighted Allocator

When the free space difference between the OSTs is significant, then a weighting algorithm is used to influence OST ordering based on size and location. Note that these are weightings for a random algorithm, so the "emptiest" OST is not, necessarily, chosen every time. On average, the weighted allocator fills the emptier OSTs faster.

24.4.5 Adjusting the Weighting Between Free Space and Location

This priority can be adjusted via the /proc/fs/lustre/lov/lustre-mdtlov/qos_prio_free proc file. The default is 90%. Use the following command to permanently change this weighting on the MGS:

lctl conf_param <fsname>-MDT0000.lov.qos_prio_free=90

Increasing the value puts more weighting on free space. When the free space priority is set to 100%, then location is no longer used in stripe-ordering calculations, and weighting is based entirely on free space.

Note that setting the priority to 100% means that OSS distribution does not count in the weighting, but the stripe assignment is still done via a weighting--if OST2 has twice as much free space as OST1, then OST2 is twice as likely to be used, but it is not guaranteed to be used.


24.5 Handing Full OSTs

Sometimes a Lustre file system becomes unbalanced, often due to changed stripe settings. If an OST is full and an attempt is made to write more information to the file system, an error occurs. The procedures below describe how to handle a full OST.

24.5.1 Checking File System Usage

The example below shows an unbalanced file system:

root@LustreClient01 ~]# lfs df -h
UUID                 bytes   Used  Available Use%  Mounted on
lustre-MDT0000_UUID  4.4G   214.5M   3.9G     4%   /mnt/lustre[MDT:0]
lustre-OST0000_UUID  2.0G   751.3M   1.1G    37%   /mnt/lustre[OST:0]
lustre-OST0001_UUID  2.0G   755.3M   1.1G    37%   /mnt/lustre[OST:1]
lustre-OST0002_UUID  2.0G     1.7G 155.1M    86%   /mnt/lustre[OST:2] <-
lustre-OST0003_UUID  2.0G   751.3M   1.1G    37%   /mnt/lustre[OST:3]
lustre-OST0004_UUID  2.0G   747.3M   1.1G    37%   /mnt/lustre[OST:4]
lustre-OST0005_UUID  2.0G   743.3M   1.1G    36%   /mnt/lustre[OST:5]
 
filesystem summary: 11.8G     5.4G    5.8G    45%  /mnt/lustre

In this case, OST:2 is almost full and when an attempt is made to write additional information to the file system (even with uniform striping over all the OSTs), the write command fails as follows:

[root@LustreClient01 ~]# lfs setstripe /mnt/lustre 4M 0 -1
[root@LustreClient01 ~]# dd if=/dev/zero of=/mnt/lustre/test_3 \ bs=10M count=100
dd: writing `/mnt/lustre/test_3': No space left on device
98+0 records in
97+0 records out
1017192448 bytes (1.0 GB) copied, 23.2411 seconds, 43.8 MB/s

24.5.2 Taking a Full OST Offline

To enable continued use of the file system, the full OST has to be taken offline or, more specifically, rendered read-only using the lctl command. This is done on the MDS, since the MSD allocates space for writing.

1. Log into the MDS server:

[root@LustreClient01 ~]# ssh root@192.168.0.10 
root@192.168.0.10's password: 
Last login: Wed Nov 26 13:35:12 2008 from 192.168.0.6

2. Use the lctl dl command to show the status of all file system components:

[root@mds ~]# lctl dl 
0 UP mgs MGS MGS 9 
1 UP mgc MGC192.168.0.10@tcp e384bb0e-680b-ce25-7bc9-81655dd1e813 5
2 UP mdt MDS MDS_uuid 3
3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4
4 UP mds lustre-MDT0000 lustre-MDT0000_UUID 5
5 UP osc lustre-OST0000-osc lustre-mdtlov_UUID 5
6 UP osc lustre-OST0001-osc lustre-mdtlov_UUID 5
7 UP osc lustre-OST0002-osc lustre-mdtlov_UUID 5
8 UP osc lustre-OST0003-osc lustre-mdtlov_UUID 5
9 UP osc lustre-OST0004-osc lustre-mdtlov_UUID 5
10 UP osc lustre-OST0005-osc lustre-mdtlov_UUID 5

3. Use lctl deactivate to take the full OST offline:

[root@mds ~]# lctl --device 7 deactivate

4. Display the status of the file system components:

[root@mds ~]# lctl dl 
0 UP mgs MGS MGS 9
1 UP mgc MGC192.168.0.10@tcp e384bb0e-680b-ce25-7bc9-81655dd1e813 5
2 UP mdt MDS MDS_uuid 3
3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4
4 UP mds lustre-MDT0000 lustre-MDT0000_UUID 5
5 UP osc lustre-OST0000-osc lustre-mdtlov_UUID 5
6 UP osc lustre-OST0001-osc lustre-mdtlov_UUID 5
7 IN osc lustre-OST0002-osc lustre-mdtlov_UUID 5
8 UP osc lustre-OST0003-osc lustre-mdtlov_UUID 5
9 UP osc lustre-OST0004-osc lustre-mdtlov_UUID 5
10 UP osc lustre-OST0005-osc lustre-mdtlov_UUID 5

The device list shows that OST2 is now inactive. If a new file is now written to the file system, the write will be successful as the stripes are allocated across the remaining active OSTs.

24.5.3 Migrating Data within a File System

As stripes cannot be moved within the file system, data must be migrated manually by copying and renaming the file, removing the original file, and renaming the new file with the original file name.

1. Identify the file(s) to be moved. In the example below, output from the getstripe command indicates that the file test_2 is located entirely on OST2:

[root@LustreClient01 ~]# lfs getstripe /mnt/lustre/test_2
OBDS:
0: lustre-OST0000_UUID ACTIVE
1: lustre-OST0001_UUID ACTIVE
2: lustre-OST0002_UUID ACTIVE
3: lustre-OST0003_UUID ACTIVE
4: lustre-OST0004_UUID ACTIVE
5: lustre-OST0005_UUID ACTIVE
/mnt/lustre/test_2
obdidx      objid     objid     group
     2          8       0x8         0

2. Move the file(s).

[root@LustreClient01 ~]# cp /mnt/lustre/test_2 /mnt/lustre/test_2.tmp
[root@LustreClient01 ~]# rm /mnt/lustre/test_2
rm: remove regular file `/mnt/lustre/test_2'? Y

3. Check the file system balance. The df output in the example below shows a more balanced system compared to the df output in the example in Handing Full OSTs.

[root@LustreClient01 ~]# lfs df -h
UUID                  bytes   Used Available Use% Mounted on
lustre-MDT0000_UUID   4.4G  214.5M      3.9G   4% /mnt/lustre[MDT:0]
lustre-OST0000_UUID   2.0G    1.3G    598.1M  65% /mnt/lustre[OST:0]
lustre-OST0001_UUID   2.0G    1.3G    594.1M  65% /mnt/lustre[OST:1]
lustre-OST0002_UUID   2.0G  913.4M   1000.0M  45% /mnt/lustre[OST:2]
lustre-OST0003_UUID   2.0G    1.3G    602.1M  65% /mnt/lustre[OST:3]
lustre-OST0004_UUID   2.0G    1.3G    606.1M  64% /mnt/lustre[OST:4]
lustre-OST0005_UUID   2.0G    1.3G    610.1M  64% /mnt/lustre[OST:5]
 
filesystem summary:  11.8G    7.3G      3.9G  61% /mnt/lustre

4. Change the name of the file back to the original filename so it can be found by clients.

[root@LustreClient01 ~]# mv test2.tmp test2
[root@LustreClient01 ~]# ls /mnt/lustre
test1 test_2 test3 test_3 test4 test_4 test_x

5. Reactivate the OST from the MDS for further writes:

[root@mds ~]# lctl --device 7 activate
[root@mds ~]# lctl dl
  0 UP mgs MGS MGS 9
  1 UP mgc MGC192.168.0.10@tcp e384bb0e-680b-ce25-7bc9-816dd1e813 5
  2 UP mdt MDS MDS_uuid 3
  3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4
  4 UP mds lustre-MDT0000 lustre-MDT0000_UUID 5
  5 UP osc lustre-OST0000-osc lustre-mdtlov_UUID 5
  6 UP osc lustre-OST0001-osc lustre-mdtlov_UUID 5
  7 UP osc lustre-OST0002-osc lustre-mdtlov_UUID 5
  8 UP osc lustre-OST0003-osc lustre-mdtlov_UUID 5
  9 UP osc lustre-OST0004-osc lustre-mdtlov_UUID 5
 10 UP osc lustre-OST0005-osc lustre-mdtlov_UUID


24.6 Performing Direct I/O

Starting with 1.4.7, Lustre supports the O_DIRECT flag to open.

Applications using the read() and write() calls must supply buffers aligned on a page boundary (usually 4 K). If the alignment is not correct, the call returns -EINVAL. Direct I/O may help performance in cases where the client is doing a large amount of I/O and is CPU-bound (CPU utilization 100%).

24.6.1 Making File System Objects Immutable

An immutable file or directory is one that cannot be modified, renamed or removed. To do this:

chattr +i <file>

To remove this flag, use chattr -i


24.7 Other I/O Options

This section describes other I/O options, including checksums.

24.7.1 Lustre Checksums

To guard against network data corruption, a Lustre client can perform two types of data checksums: in-memory (for data in client memory) and wire (for data sent over the network). For each checksum type, a 32-bit checksum of the data read or written on both the client and server is computed, to ensure that the data has not been corrupted in transit over the network. The ldiskfs backing file system does NOT do any persistent checksumming, so it does not detect corruption of data in the OST file system.

In Lustre 1.6.5 and later, the checksumming feature is enabled, by default, on individual client nodes. If the client or OST detects a checksum mismatch, then an error is logged in the syslog of the form:

LustreError: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.1.1@tcp inum 8991479/2386814769 object 1127239/0 extent [102400-106495]

If this happens, the client will re-read or re-write the affected data up to five times to get a good copy of the data over the network. If it is still not possible, then an I/O error is returned to the application.

To enable both types of checksums (in-memory and wire), run:

echo 1 > /proc/fs/lustre/llite/<fsname>/checksum_pages

To disable both types of checksums (in-memory and wire), run:

echo 0 > /proc/fs/lustre/llite/<fsname>/checksum_pages

To check the status of a wire checksum, run:

lctl get_param osc.*.checksums

 

24.7.1.1 Changing Checksum Algorithms

By default, Lustre uses the adler32 checksum algorithm, because it is robust and has a lower impact on performance than crc32. The Lustre administrator can change the checksum algorithm via /proc, depending on what is supported in the kernel.

To check which checksum algorithm is being used by Lustre, run:

$ cat /proc/fs/lustre/osc/<fsname>-OST<index>-osc-*/checksum_type

To change the wire checksum algorithm used by Lustre, run:

$ echo <algorithm name> /proc/fs/lustre/osc/<fsname>-OST<index>- \osc-*/checksum_type


Note - The in-memory checksum always uses the adler32 algorithm, if available, and only falls back to crc32 if adler32 cannot be used.


In the following example, the cat command is used to determine that Lustre is using the adler32 checksum algorithm. Then the echo command is used to change the checksum algorithm to crc32. A second cat command confirms that the crc32 checksum algorithm is now in use.

$ cat /proc/fs/lustre/osc/lustre-OST0000-osc- \ffff81012b2c48e0/checksum_type
crc32 [adler]
$ echo crc32 > /proc/fs/lustre/osc/lustre-OST0000-osc- \ffff81012b2c48e0/checksum_type
$ cat /proc/fs/lustre/osc/lustre-OST0000-osc- \ffff81012b2c48e0/checksum_type
[crc32] adler

 

 


24.8 Striping Using llapi

Use llapi_file_create to set Lustre properties for a new file. For a synopsis and description of llapi_file_create and examples of how to use it, see Setting Lustre Properties (man3).

You can set striping from inside programs like ioctl. To compile the sample program, you need to download libtest.c and liblustreapi.c files from the Lustre source tree.

A simple C program to demonstrate striping API - libtest.c

/* -*- mode: c; c-basic-offset: 8; indent-tabs-mode: nil; -*-
 * vim:expandtab:shiftwidth=8:tabstop=8:
 *
 * lustredemo - simple code examples of liblustreapi functions
 */
 
#include <stdio.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <dirent.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>
 
#include <lustre/liblustreapi.h>
#include <lustre/lustre_user.h>
#define MAX_OSTS 1024
#define LOV_EA_SIZE(lum, num) (sizeof(*lum) + num * sizeof(*lum->lmm_objects))
#define LOV_EA_MAX(lum) LOV_EA_SIZE(lum, MAX_OSTS)
 
/* 
This program provides crude examples of using the liblustre API functions 
*/
 
/* Change these definitions to suit */
 
#define TESTDIR "/tmp"							/* Results directory */
#define TESTFILE "lustre_dummy"							/* Name for the file we create/destroy */
#define FILESIZE 262144							/* Size of the file in words */
#define DUMWORD "DEADBEEF"							/* Dummy word used to fill files */
#define MY_STRIPE_WIDTH 2							/* Set this to the number of OST required */
#define MY_LUSTRE_DIR "/mnt/lustre/ftest"
 
 
int close_file(int fd)
{      
	if (close(fd) < 0) {
		fprintf(stderr, "File close failed: %d (%s)\n", errno, strerror(errno));
		return -1;
	}
	return 0;
}
 
int write_file(int fd)
{
	char *stng =  DUMWORD;
	int cnt = 0;
 
	for( cnt = 0; cnt < FILESIZE; cnt++) {
                write(fd, stng, sizeof(stng));
	}
	return 0;
}
/* Open a file, set a specific stripe count, size and starting OST
   Adjust the parameters to suit */
  
int open_stripe_file()
{
	char *tfile = TESTFILE;
	int stripe_size = 65536;										/* System default is 4M */
	int stripe_offset = -1;										/* Start at default */
	int stripe_count = MY_STRIPE_WIDTH;										/*Single stripe for this demo*/
	int stripe_pattern = 0; 										/* only RAID 0 at this time */
	int rc, fd;
	/* 
	*/
	rc = llapi_file_create(tfile,
stripe_size,stripe_offset,stripe_count,stripe_pattern);
	/* result code is inverted, we may return -EINVAL or an ioctl error.
	We borrow an error message from sanity.c 
	*/
	if (rc) {
                fprintf(stderr,"llapi_file_create failed: %d (%s) \n", rc, strerror(-rc));
                return -1;
        }
        /* llapi_file_create closes the file descriptor, we must re-open */
        fd = open(tfile, O_CREAT | O_RDWR | O_LOV_DELAY_CREATE, 0644);
        if (fd < 0) {
                fprintf(stderr, "Can't open %s file: %d (%s)\n", tfile, errno, strerror(errno));
		return -1;
        }
        return fd;
}
 
/* output a list of uuids for this file */
int get_my_uuids(int fd)
{
	struct obd_uuid uuids[1024], *uuidp;											/* Output var */
	int obdcount = 1024;    
	int rc,i;
 
	rc = llapi_lov_get_uuids(fd, uuids, &obdcount);
	if (rc != 0) {
		fprintf(stderr, "get uuids failed: %d (%s)\n",errno, strerror(errno));
        }
        printf("This file system has %d obds\n", obdcount);
        for (i = 0, uuidp = uuids; i < obdcount; i++, uuidp++) {
		printf("UUID %d is %s\n",i, uuidp->uuid);
        }
        return 0;
}
 
/* Print out some LOV attributes. List our objects */
int get_file_info(char *path)
{
 
	struct lov_user_md *lump;
	int rc;
	int i;
     
	lump = malloc(LOV_EA_MAX(lump));
	if (lump == NULL) {
		return -1;
        }
 
        rc = llapi_file_get_stripe(path, lump);
        
        if (rc != 0) {
		fprintf(stderr, "get_stripe failed: %d (%s)\n",errno, strerror(errno));
		return -1;
        }
 
	printf("Lov magic %u\n", lump->lmm_magic);
	printf("Lov pattern %u\n", lump->lmm_pattern);
	printf("Lov object id %llu\n", lump->lmm_object_id);
	printf("Lov object group %llu\n", lump->lmm_object_gr);
	printf("Lov stripe size %u\n", lump->lmm_stripe_size);
	printf("Lov stripe count %hu\n", lump->lmm_stripe_count);
	printf("Lov stripe offset %u\n", lump->lmm_stripe_offset);
	for (i = 0; i < lump->lmm_stripe_count; i++) {
		printf("Object index %d Objid %llu\n", lump->lmm_objects[i].l_ost_idx, lump->lmm_objects[i].l_object_id);
        }
    
 
	free(lump);
	return rc;
   
}
/* Ping all OSTs that belong to this filesysem */
 
int ping_osts()
{
	DIR *dir;
	struct dirent *d;
	char osc_dir[100];
	int rc;
 
	sprintf(osc_dir, "/proc/fs/lustre/osc");
	dir = opendir(osc_dir);
	if (dir == NULL) {
		printf("Can't open dir\n");
		return -1;
	}
	while((d = readdir(dir)) != NULL) {
		if ( d->d_type == DT_DIR ) {
			if (! strncmp(d->d_name, "OSC", 3)) {
				printf("Pinging OSC %s ", d->d_name);
				rc = llapi_ping("osc", d->d_name);
				if (rc) {
					printf("  bad\n");
				} else {
					printf("  good\n");
				}
			}
		}
	}
	return 0;
 
}
 
int main()
{
	int file;
	int rc;
	char filename[100];
	char sys_cmd[100];
 
	sprintf(filename, "%s/%s",MY_LUSTRE_DIR, TESTFILE);
    
	printf("Open a file with striping\n");
	file = open_stripe_file();
	if ( file < 0 ) {
		printf("Exiting\n");
		exit(1);
	
 
 
 
	}
	printf("Getting uuid list\n");
	rc = get_my_uuids(file);
	rintf("Write to the file\n");
	rc = write_file(file);
	rc = close_file(file);
	printf("Listing LOV data\n");
	rc = get_file_info(filename);
	printf("Ping our OSTs\n");
	rc = ping_osts();
 
	/* the results should match lfs getstripe */
	printf("Confirming our results with lfs getsrtipe\n");
	sprintf(sys_cmd, "/usr/bin/lfs getstripe %s/%s", MY_LUSTRE_DIR, TESTFILE);
	system(sys_cmd);
 
	printf("All done\n");
	exit(rc);
}	

Makefile for sample application:

 
gcc -g -O2 -Wall -o lustredemo libtest.c -llustreapi
clean:
rm -f core lustredemo *.o
run: 
make
rm -f /mnt/lustre/ftest/lustredemo
rm -f /mnt/lustre/ftest/lustre_dummy
cp lustredemo /mnt/lustre/ftest/