| C H A P T E R 24 |
|
Striping and I/O Options |
This chapter describes file striping and I/O options, and includes the following sections:
Lustre stores files of one or more objects on OSTs. When a file is comprised of more than one object, Lustre stripes the file data across them in a round-robin fashion. Users can configure the number of stripes, the size of each stripe, and the servers that are used.
One of the most frequently-asked Lustre questions is “How should I stripe my files, and what is a good default?” The short answer is that it depends on your needs. A good rule of thumb is to stripe over as few objects as will meet those needs and no more.
There are two reasons to create files of multiple stripes: bandwidth and size.
There are many applications which require high-bandwidth access to a single file - more bandwidth than can be provided by a single OSS. For example, scientific applications which write to a single file from hundreds of nodes or a binary executable which is loaded by many nodes when an application starts.
In cases like these, stripe your file over as many OSSs as it takes to achieve the required peak aggregate bandwidth for that file. In our experience, the requirement is “as quickly as possible,” which usually means all OSSs.
The second reason to stripe is when a single OST does not have enough free space to hold the entire file.
There is never an exact, one-to-one mapping between clients and OSTs. Lustre uses a round-robin algorithm for OST stripe selection until free space on OSTs differ by more than 20%. However, depending on actual file sizes, some stripes may be mostly empty, while others are more full. For a more detailed description of stripe assignments, see Managing Free Space.
After every ostcount+1 objects, Lustre skips an OST. This causes Lustre’s "starting point" to precess around, eliminating some degenerated cases where applications that create very regular file layouts (striping patterns) would have preferentially used a particular OST in the sequence.
There are two disadvantages to striping which should deter you from choosing a default policy that stripes over all OSTs unless you really need it: increased overhead and increased risk.
Increased overhead comes in the form of extra network operations during common operations such as stat and unlink, and more locks. Even when these operations are performed in parallel, there is a big difference between doing 1 network operation and 100 operations.
Increased overhead also comes in the form of server contention. Consider a cluster with 100 clients and 100 OSSs, each with one OST. If each file has exactly one object and the load is distributed evenly, there is no contention and the disks on each server can manage sequential I/O. If each file has 100 objects, then the clients all compete with one another for the attention of the servers, and the disks on each node seek in 100 different directions. In this case, there is needless contention.
Increased risk is evident when you consider the example of striping each file across all servers. In this case, if any one OSS catches on-fire, a small part of every file is lost. By comparison, if each file has exactly one stripe, you lose fewer files, but you lose them in their entirety. Most users would rather lose some of their files entirely than all of their files partially.
Choosing a stripe size is a small balancing act, but there are reasonable defaults. The stripe size must be a multiple of the page size. For safety, Lustre’s tools enforce a multiple of 64 KB (the maximum page size on ia64 and PPC64 nodes), so users on platforms with smaller pages do not accidentally create files which might cause problems for ia64 clients.
Although you can create files with a stripe size of 64 KB, this is a poor choice. Practically, the smallest recommended stripe size is 512 KB because Lustre sends 1 MB chunks over the network. This is a good amount of data to transfer at one time. Choosing a smaller stripe size may hinder the batching.
Generally, a good stripe size for sequential I/O using high-speed networks is between 1 MB and 4 MB. Stripe sizes larger than 4 MB do not parallelize as effectively because Lustre tries to keep the amount of dirty cached data below 32 MB per server (with the default configuration).
Writes which cross an object boundary are slightly less efficient than writes which go entirely to one server. Depending on your application's write patterns, you can assist it by choosing a stripe size with that in mind. If the file is written in a very consistent and aligned way, make the stripe size a multiple of the write() size.
The choice of stripe size has no effect on a single-stripe file.
Use lfs to print the index and UUID for each OST in the file system, along with the OST index and object ID for each stripe in the file. For directories, the default settings for files created in that directory are printed.
lfs getstripe <filename>
Use lfs find to inspect an entire tree of files.
lfs find [--recursive | -r] <file or directory> ...
If a process creates a file, use the lfs getstripe command to determine which OST(s) the file resides on.
Using ‘cat’ as an example, run:
$ cat > foo
$ lfs getstripe /barn/users/jacob/tmp/foo OBDS
You can also use ls -l /proc/<pid>/fd/ to find open files using Lustre, run:
$ lfs getstripe $(readlink /proc/$(pidof cat)/fd/1)
0: databarn-ost1_UUID ACTIVE 1: databarn-ost2_UUID ACTIVE 2: databarn-ost3_UUID ACTIVE 3: databarn-ost4_UUID ACTIVE /barn/users/jacob/tmp/foo obdidx objid objid group 2 835487 0xcbf9f 0
This shows that the file lives on obdidx 2, which is databarn-ost3. To see which node is serving that OST, run:
$ cat /proc/fs/lustre/osc/*databarn-ost3*/ost_conn_uuid NID_oss1.databarn.87k.net_UUID
The above condition/operation also works with connections to the MDS. For that, replace osc with mdc and ost with mds in the above commands.
Use the lfs setstripe command to create new files with a specific file layout (stripe pattern) configuration.
lfs setstripe [--size|-s stripe-size] [--count|-c stripe-cnt] [--index|-i start-ost] <filename|dirname>
Stripe size is how much data to write to one OST before moving to the next OST. The default stripe-size is 1 MB, and passing a stripe-size of 0 causes the default stripe size to be used. Otherwise, the stripe-size must be a multiple of 64 KB.
Stripe count is how many OSTs to use. The default stripe-count is 1, and passing a stripe-count of 0 causes the default stripe count to be used. A stripe-count of -1 means always stripe over all OSTs.
Start ost is the first OST to which files are written. The default start-ost is -1, and passing a start-ost of -1 allows the MDS to choose the starting index. This setting is strongly recommended, as it allows space and load balancing to be done by the MDS as needed. Otherwise, the file starts on the specified OST index, starting at zero (0).
In a directory, the lfs setstripe command sets a default striping configuration for files created in the directory. The usage is the same as lfs setstripe for a regular file, except that the directory must exist prior to setting the default striping configuration. If a file is created in a directory with a default stripe configuration (without otherwise specifying striping), Lustre uses those striping parameters instead of the file system default for the new file.
To change the striping pattern (file layout) for a sub-directory, create a directory with desired file layout as described above. Sub-directories inherit the file layout of the root/parent directory.
To use a specific striping pattern (file layout) for a specific file:
lfs setstripe creates a file with a given stripe pattern (file layout)
lfs setstripe fails if the file already exists
You can use lfs setstripe to create a file on a specific OST. In the following example, the file "bob" will be created on the first OST (id 0).
$ lfs setstripe --count 1 --index 0 bob $ dd if=/dev/zero of=bob count=1 bs=100M 1+0 records in 1+0 records out $ lfs getstripe bob
0: home-OST0000_UUID ACTIVE [...] bob obdidx objid objid group 0 33459243 0x1fe8c2b 0
In Lustre 1.6, the MDT assigns file stripes to OSTs based on location (which OSS) and size considerations (free space) to optimize file system performance. Emptier OSTs are preferentially selected for stripes, and stripes are preferentially spread out between OSSs to increase network bandwidth utilization. The weighting factor between these two optimizations is user-adjustable.
Free space is an important consideration in assigning file stripes. The lfs df command shows available disk space on the mounted Lustre file system and space consumption per OST. If multiple Lustre file systems are mounted, a path may be specified, but is not required.
|
Human-readable print sizes in human readable format (for example: 1K, 234M, 5G). |
|
[lin-cli1] $ lfs df UUID 1K-blockS Used Available Use% Mounted on mds-lustre-0_UUID 9174328 1020024 8154304 11% /mnt/lustre[MDT:0] ost-lustre-0_UUID 94181368 56330708 37850660 59% /mnt/lustre[OST:0] ost-lustre-1_UUID 94181368 56385748 37795620 59% /mnt/lustre[OST:1] ost-lustre-2_UUID 94181368 54352012 39829356 57% /mnt/lustre[OST:2] filesystem summary: 282544104 167068468 39829356 57% /mnt/lustre [lin-cli1] $ lfs df -h UUID bytes Used Available Use% Mounted on mds-lustre-0_UUID 8.7G 996.1M 7.8G 11% /mnt/lustre[MDT:0] ost-lustre-0_UUID 89.8G 53.7G 36.1G 59% /mnt/lustre[OST:0] ost-lustre-1_UUID 89.8G 53.8G 36.0G 59% /mnt/lustre[OST:1] ost-lustre-2_UUID 89.8G 51.8G 38.0G 57% /mnt/lustre[OST:2] filesystem summary: 269.5G 159.3G 110.1G 59% /mnt/lustre [lin-cli1] $ lfs df -i UUID Inodes IUsed IFree IUse% Mounted on mds-lustre-0_UUID 2211572 41924 2169648 1% /mnt/lustre[MDT:0] ost-lustre-0_UUID 737280 12183 725097 1% /mnt/lustre[OST:0] ost-lustre-1_UUID 737280 12232 725048 1% /mnt/lustre[OST:1] ost-lustre-2_UUID 737280 12214 725066 1% /mnt/lustre[OST:2] filesystem summary: 2211572 41924 2169648 1% /mnt/lustre[OST:2]
There are two stripe allocation methods, round-robin and weighted. The allocation method is determined by the amount of free-space imbalance on the OSTs. The weighted allocator is used when any two OSTs are imbalanced by more than 20%. Until then, a faster round-robin allocator is used. (The round-robin order maximizes network balancing.)
When OSTs have approximately the same amount of free space (within 20%), an efficient round-robin allocator is used. The round-robin allocator alternates stripes between OSTs on different OSSs, so the OST used for stripe 0 of each file is evenly distributed among OSTs, regardless of the stripe count. Here are several sample round-robin stripe orders (the same letter represents the different OSTs on a single OSS):
When the free space difference between the OSTs is significant, then a weighting algorithm is used to influence OST ordering based on size and location. Note that these are weightings for a random algorithm, so the "emptiest" OST is not, necessarily, chosen every time. On average, the weighted allocator fills the emptier OSTs faster.
This priority can be adjusted via the /proc/fs/lustre/lov/lustre-mdtlov/qos_prio_free proc file. The default is 90%. Use the following command to permanently change this weighting on the MGS:
lctl conf_param <fsname>-MDT0000.lov.qos_prio_free=90
Increasing the value puts more weighting on free space. When the free space priority is set to 100%, then location is no longer used in stripe-ordering calculations, and weighting is based entirely on free space.
Note that setting the priority to 100% means that OSS distribution does not count in the weighting, but the stripe assignment is still done via a weighting--if OST2 has twice as much free space as OST1, then OST2 is twice as likely to be used, but it is not guaranteed to be used.
Sometimes a Lustre file system becomes unbalanced, often due to changed stripe settings. If an OST is full and an attempt is made to write more information to the file system, an error occurs. The procedures below describe how to handle a full OST.
The example below shows an unbalanced file system:
root@LustreClient01 ~]# lfs df -h UUID bytes Used Available Use% Mounted on lustre-MDT0000_UUID 4.4G 214.5M 3.9G 4% /mnt/lustre[MDT:0] lustre-OST0000_UUID 2.0G 751.3M 1.1G 37% /mnt/lustre[OST:0] lustre-OST0001_UUID 2.0G 755.3M 1.1G 37% /mnt/lustre[OST:1] lustre-OST0002_UUID 2.0G 1.7G 155.1M 86% /mnt/lustre[OST:2] <- lustre-OST0003_UUID 2.0G 751.3M 1.1G 37% /mnt/lustre[OST:3] lustre-OST0004_UUID 2.0G 747.3M 1.1G 37% /mnt/lustre[OST:4] lustre-OST0005_UUID 2.0G 743.3M 1.1G 36% /mnt/lustre[OST:5] filesystem summary: 11.8G 5.4G 5.8G 45% /mnt/lustre
In this case, OST:2 is almost full and when an attempt is made to write additional information to the file system (even with uniform striping over all the OSTs), the write command fails as follows:
[root@LustreClient01 ~]# lfs setstripe /mnt/lustre 4M 0 -1 [root@LustreClient01 ~]# dd if=/dev/zero of=/mnt/lustre/test_3 \ bs=10M count=100 dd: writing `/mnt/lustre/test_3': No space left on device 98+0 records in 97+0 records out 1017192448 bytes (1.0 GB) copied, 23.2411 seconds, 43.8 MB/s
To enable continued use of the file system, the full OST has to be taken offline or, more specifically, rendered read-only using the lctl command. This is done on the MDS, since the MSD allocates space for writing.
[root@LustreClient01 ~]# ssh root@192.168.0.10 root@192.168.0.10's password: Last login: Wed Nov 26 13:35:12 2008 from 192.168.0.6
2. Use the lctl dl command to show the status of all file system components:
[root@mds ~]# lctl dl 0 UP mgs MGS MGS 9 1 UP mgc MGC192.168.0.10@tcp e384bb0e-680b-ce25-7bc9-81655dd1e813 5 2 UP mdt MDS MDS_uuid 3 3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4 4 UP mds lustre-MDT0000 lustre-MDT0000_UUID 5 5 UP osc lustre-OST0000-osc lustre-mdtlov_UUID 5 6 UP osc lustre-OST0001-osc lustre-mdtlov_UUID 5 7 UP osc lustre-OST0002-osc lustre-mdtlov_UUID 5 8 UP osc lustre-OST0003-osc lustre-mdtlov_UUID 5 9 UP osc lustre-OST0004-osc lustre-mdtlov_UUID 5 10 UP osc lustre-OST0005-osc lustre-mdtlov_UUID 5
3. Use lctl deactivate to take the full OST offline:
[root@mds ~]# lctl --device 7 deactivate
4. Display the status of the file system components:
[root@mds ~]# lctl dl 0 UP mgs MGS MGS 9 1 UP mgc MGC192.168.0.10@tcp e384bb0e-680b-ce25-7bc9-81655dd1e813 5 2 UP mdt MDS MDS_uuid 3 3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4 4 UP mds lustre-MDT0000 lustre-MDT0000_UUID 5 5 UP osc lustre-OST0000-osc lustre-mdtlov_UUID 5 6 UP osc lustre-OST0001-osc lustre-mdtlov_UUID 5 7 IN osc lustre-OST0002-osc lustre-mdtlov_UUID 5 8 UP osc lustre-OST0003-osc lustre-mdtlov_UUID 5 9 UP osc lustre-OST0004-osc lustre-mdtlov_UUID 5 10 UP osc lustre-OST0005-osc lustre-mdtlov_UUID 5
The device list shows that OST2 is now inactive. If a new file is now written to the file system, the write will be successful as the stripes are allocated across the remaining active OSTs.
As stripes cannot be moved within the file system, data must be migrated manually by copying and renaming the file, removing the original file, and renaming the new file with the original file name.
1. Identify the file(s) to be moved. In the example below, output from the getstripe command indicates that the file test_2 is located entirely on OST2:
[root@LustreClient01 ~]# lfs getstripe /mnt/lustre/test_2
OBDS:
0: lustre-OST0000_UUID ACTIVE
1: lustre-OST0001_UUID ACTIVE
2: lustre-OST0002_UUID ACTIVE
3: lustre-OST0003_UUID ACTIVE
4: lustre-OST0004_UUID ACTIVE
5: lustre-OST0005_UUID ACTIVE
/mnt/lustre/test_2
obdidx objid objid group
2 8 0x8 0
[root@LustreClient01 ~]# cp /mnt/lustre/test_2 /mnt/lustre/test_2.tmp [root@LustreClient01 ~]# rm /mnt/lustre/test_2 rm: remove regular file `/mnt/lustre/test_2'? Y
3. Check the file system balance. The df output in the example below shows a more balanced system compared to the df output in the example in Handing Full OSTs.
[root@LustreClient01 ~]# lfs df -h UUID bytes Used Available Use% Mounted on lustre-MDT0000_UUID 4.4G 214.5M 3.9G 4% /mnt/lustre[MDT:0] lustre-OST0000_UUID 2.0G 1.3G 598.1M 65% /mnt/lustre[OST:0] lustre-OST0001_UUID 2.0G 1.3G 594.1M 65% /mnt/lustre[OST:1] lustre-OST0002_UUID 2.0G 913.4M 1000.0M 45% /mnt/lustre[OST:2] lustre-OST0003_UUID 2.0G 1.3G 602.1M 65% /mnt/lustre[OST:3] lustre-OST0004_UUID 2.0G 1.3G 606.1M 64% /mnt/lustre[OST:4] lustre-OST0005_UUID 2.0G 1.3G 610.1M 64% /mnt/lustre[OST:5] filesystem summary: 11.8G 7.3G 3.9G 61% /mnt/lustre
4. Change the name of the file back to the original filename so it can be found by clients.
[root@LustreClient01 ~]# mv test2.tmp test2 [root@LustreClient01 ~]# ls /mnt/lustre test1 test_2 test3 test_3 test4 test_4 test_x
5. Reactivate the OST from the MDS for further writes:
[root@mds ~]# lctl --device 7 activate [root@mds ~]# lctl dl 0 UP mgs MGS MGS 9 1 UP mgc MGC192.168.0.10@tcp e384bb0e-680b-ce25-7bc9-816dd1e813 5 2 UP mdt MDS MDS_uuid 3 3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4 4 UP mds lustre-MDT0000 lustre-MDT0000_UUID 5 5 UP osc lustre-OST0000-osc lustre-mdtlov_UUID 5 6 UP osc lustre-OST0001-osc lustre-mdtlov_UUID 5 7 UP osc lustre-OST0002-osc lustre-mdtlov_UUID 5 8 UP osc lustre-OST0003-osc lustre-mdtlov_UUID 5 9 UP osc lustre-OST0004-osc lustre-mdtlov_UUID 5 10 UP osc lustre-OST0005-osc lustre-mdtlov_UUID
Starting with 1.4.7, Lustre supports the O_DIRECT flag to open.
Applications using the read() and write() calls must supply buffers aligned on a page boundary (usually 4 K). If the alignment is not correct, the call returns -EINVAL. Direct I/O may help performance in cases where the client is doing a large amount of I/O and is CPU-bound (CPU utilization 100%).
An immutable file or directory is one that cannot be modified, renamed or removed. To do this:
chattr +i <file>
To remove this flag, use chattr -i
This section describes other I/O options, including checksums.
To guard against network data corruption, a Lustre client can perform two types of data checksums: in-memory (for data in client memory) and wire (for data sent over the network). For each checksum type, a 32-bit checksum of the data read or written on both the client and server is computed, to ensure that the data has not been corrupted in transit over the network. The ldiskfs backing file system does NOT do any persistent checksumming, so it does not detect corruption of data in the OST file system.
In Lustre 1.6.5 and later, the checksumming feature is enabled, by default, on individual client nodes. If the client or OST detects a checksum mismatch, then an error is logged in the syslog of the form:
LustreError: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.1.1@tcp inum 8991479/2386814769 object 1127239/0 extent [102400-106495]
If this happens, the client will re-read or re-write the affected data up to five times to get a good copy of the data over the network. If it is still not possible, then an I/O error is returned to the application.
To enable both types of checksums (in-memory and wire), run:
echo 1 > /proc/fs/lustre/llite/<fsname>/checksum_pages
To disable both types of checksums (in-memory and wire), run:
echo 0 > /proc/fs/lustre/llite/<fsname>/checksum_pages
To check the status of a wire checksum, run:
lctl get_param osc.*.checksums
By default, Lustre uses the adler32 checksum algorithm, because it is robust and has a lower impact on performance than crc32. The Lustre administrator can change the checksum algorithm via /proc, depending on what is supported in the kernel.
To check which checksum algorithm is being used by Lustre, run:
$ cat /proc/fs/lustre/osc/<fsname>-OST<index>-osc-*/checksum_type
To change the wire checksum algorithm used by Lustre, run:
$ echo <algorithm name> /proc/fs/lustre/osc/<fsname>-OST<index>- \osc-*/checksum_type
| Note - The in-memory checksum always uses the adler32 algorithm, if available, and only falls back to crc32 if adler32 cannot be used. |
In the following example, the cat command is used to determine that Lustre is using the adler32 checksum algorithm. Then the echo command is used to change the checksum algorithm to crc32. A second cat command confirms that the crc32 checksum algorithm is now in use.
$ cat /proc/fs/lustre/osc/lustre-OST0000-osc- \ffff81012b2c48e0/checksum_type crc32 [adler] $ echo crc32 > /proc/fs/lustre/osc/lustre-OST0000-osc- \ffff81012b2c48e0/checksum_type $ cat /proc/fs/lustre/osc/lustre-OST0000-osc- \ffff81012b2c48e0/checksum_type [crc32] adler
Use llapi_file_create to set Lustre properties for a new file. For a synopsis and description of llapi_file_create and examples of how to use it, see Setting Lustre Properties (man3).
You can set striping from inside programs like ioctl. To compile the sample program, you need to download libtest.c and liblustreapi.c files from the Lustre source tree.
A simple C program to demonstrate striping API - libtest.c
/* -*- mode: c; c-basic-offset: 8; indent-tabs-mode: nil; -*-
* vim:expandtab:shiftwidth=8:tabstop=8:
*
* lustredemo - simple code examples of liblustreapi functions
*/
#include <stdio.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <dirent.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>
#include <lustre/liblustreapi.h>
#include <lustre/lustre_user.h>
#define MAX_OSTS 1024
#define LOV_EA_SIZE(lum, num) (sizeof(*lum) + num * sizeof(*lum->lmm_objects))
#define LOV_EA_MAX(lum) LOV_EA_SIZE(lum, MAX_OSTS)
/*
This program provides crude examples of using the liblustre API functions
*/
/* Change these definitions to suit */
#define TESTDIR "/tmp" /* Results directory */
#define TESTFILE "lustre_dummy" /* Name for the file we create/destroy */
#define FILESIZE 262144 /* Size of the file in words */
#define DUMWORD "DEADBEEF" /* Dummy word used to fill files */
#define MY_STRIPE_WIDTH 2 /* Set this to the number of OST required */
#define MY_LUSTRE_DIR "/mnt/lustre/ftest"
int close_file(int fd)
{
if (close(fd) < 0) {
fprintf(stderr, "File close failed: %d (%s)\n", errno, strerror(errno));
return -1;
}
return 0;
}
int write_file(int fd)
{
char *stng = DUMWORD;
int cnt = 0;
for( cnt = 0; cnt < FILESIZE; cnt++) {
write(fd, stng, sizeof(stng));
}
return 0;
}
/* Open a file, set a specific stripe count, size and starting OST
Adjust the parameters to suit */
int open_stripe_file()
{
char *tfile = TESTFILE;
int stripe_size = 65536; /* System default is 4M */
int stripe_offset = -1; /* Start at default */
int stripe_count = MY_STRIPE_WIDTH; /*Single stripe for this demo*/
int stripe_pattern = 0; /* only RAID 0 at this time */
int rc, fd;
/*
*/
rc = llapi_file_create(tfile,
stripe_size,stripe_offset,stripe_count,stripe_pattern);
/* result code is inverted, we may return -EINVAL or an ioctl error.
We borrow an error message from sanity.c
*/
if (rc) {
fprintf(stderr,"llapi_file_create failed: %d (%s) \n", rc, strerror(-rc));
return -1;
}
/* llapi_file_create closes the file descriptor, we must re-open */
fd = open(tfile, O_CREAT | O_RDWR | O_LOV_DELAY_CREATE, 0644);
if (fd < 0) {
fprintf(stderr, "Can't open %s file: %d (%s)\n", tfile, errno, strerror(errno));
return -1;
}
return fd;
}
/* output a list of uuids for this file */
int get_my_uuids(int fd)
{
struct obd_uuid uuids[1024], *uuidp; /* Output var */
int obdcount = 1024;
int rc,i;
rc = llapi_lov_get_uuids(fd, uuids, &obdcount);
if (rc != 0) {
fprintf(stderr, "get uuids failed: %d (%s)\n",errno, strerror(errno));
}
printf("This file system has %d obds\n", obdcount);
for (i = 0, uuidp = uuids; i < obdcount; i++, uuidp++) {
printf("UUID %d is %s\n",i, uuidp->uuid);
}
return 0;
}
/* Print out some LOV attributes. List our objects */
int get_file_info(char *path)
{
struct lov_user_md *lump;
int rc;
int i;
lump = malloc(LOV_EA_MAX(lump));
if (lump == NULL) {
return -1;
}
rc = llapi_file_get_stripe(path, lump);
if (rc != 0) {
fprintf(stderr, "get_stripe failed: %d (%s)\n",errno, strerror(errno));
return -1;
}
printf("Lov magic %u\n", lump->lmm_magic);
printf("Lov pattern %u\n", lump->lmm_pattern);
printf("Lov object id %llu\n", lump->lmm_object_id);
printf("Lov object group %llu\n", lump->lmm_object_gr);
printf("Lov stripe size %u\n", lump->lmm_stripe_size);
printf("Lov stripe count %hu\n", lump->lmm_stripe_count);
printf("Lov stripe offset %u\n", lump->lmm_stripe_offset);
for (i = 0; i < lump->lmm_stripe_count; i++) {
printf("Object index %d Objid %llu\n", lump->lmm_objects[i].l_ost_idx, lump->lmm_objects[i].l_object_id);
}
free(lump);
return rc;
}
/* Ping all OSTs that belong to this filesysem */
int ping_osts()
{
DIR *dir;
struct dirent *d;
char osc_dir[100];
int rc;
sprintf(osc_dir, "/proc/fs/lustre/osc");
dir = opendir(osc_dir);
if (dir == NULL) {
printf("Can't open dir\n");
return -1;
}
while((d = readdir(dir)) != NULL) {
if ( d->d_type == DT_DIR ) {
if (! strncmp(d->d_name, "OSC", 3)) {
printf("Pinging OSC %s ", d->d_name);
rc = llapi_ping("osc", d->d_name);
if (rc) {
printf(" bad\n");
} else {
printf(" good\n");
}
}
}
}
return 0;
}
int main()
{
int file;
int rc;
char filename[100];
char sys_cmd[100];
sprintf(filename, "%s/%s",MY_LUSTRE_DIR, TESTFILE);
printf("Open a file with striping\n");
file = open_stripe_file();
if ( file < 0 ) {
printf("Exiting\n");
exit(1);
}
printf("Getting uuid list\n");
rc = get_my_uuids(file);
rintf("Write to the file\n");
rc = write_file(file);
rc = close_file(file);
printf("Listing LOV data\n");
rc = get_file_info(filename);
printf("Ping our OSTs\n");
rc = ping_osts();
/* the results should match lfs getstripe */
printf("Confirming our results with lfs getsrtipe\n");
sprintf(sys_cmd, "/usr/bin/lfs getstripe %s/%s", MY_LUSTRE_DIR, TESTFILE);
system(sys_cmd);
printf("All done\n");
exit(rc);
}
Makefile for sample application:
gcc -g -O2 -Wall -o lustredemo libtest.c -llustreapi clean: rm -f core lustredemo *.o run: make rm -f /mnt/lustre/ftest/lustredemo rm -f /mnt/lustre/ftest/lustre_dummy cp lustredemo /mnt/lustre/ftest/
Copyright © 2010, Oracle and/or its affiliates. All rights reserved.