C H A P T E R  4

Lustre Installation

This chapter describes how to install Lustre and includes the following sections:



Note - Currently, all Lustre installations run the ext3 filesystem internally on service nodes. Lustre servers use the ext3 filesystem to store user-data objects and system data. The ext3 filesystem creates a journal for efficient recovery after a system crash or power outages.

By default, Lustre reserves 400MB on the OST for journal use[1]. This reserved space is unavailable for general storage. For this reason, you will see 400MB of space used on the OST before any file object data is saved to it.



4.1 Installing Lustre

Use this procedure to install Lustre.

1. Install the Linux base OS per your requirements and installation prerequisites like GCC and Perl, discussed in Prerequisites.

2. Install the RPMs (described in Using a Pre-Packaged Lustre Release). The preferred installation order is:

3. Verify that all cluster networking is correct. This may include /etc/hosts or DNS.

LNET self-test helps you confirm that Lustre Networking (LNET) has been properly installed and configured, and that underlying network software and hardware are performing as expected. For more information, see Lustre I/O Kit.

4. Set the correct networking options for Lustre in /etc/modprobe.conf. See Modprobe.conf



Tip - When installing Lustre with InfiniBand, keep the ibhost, kernel and Lustre all on the same revision. To do this:
1. Install the kernel source (Lustre-patched)
2. Install the Lustre source and the ibhost source.
3. Compile the ibhost against your kernel.
4. Compile the Linux kernel.
5. Compile Lustre against the ibhost source --with-vib=<path to ibhost>

Now you can use the RPMs created by the above steps.


4.1.1 MountConf

MountConf is shorthand for Mount Configuration. The Lustre cluster is configured only by the mkfs.lustre and mount commands. The MountConf system is one of the important features of Lustre 1.6.x.

MountConf involves userspace utilities (mkfs.lustre, tunefs.lustre, mount.lustre, lctl) and two new OBD types, the MGC and MGS. The MGS is a configuration management server, which compiles configuration information about all Lustre filesystems running at a site. There should be one MGS per site, not one MGS per filesystem. The MGS requires its own disk for storage. However, there is a provision to allow the MGS to share a disk ("co-locate") with an MDT of one filesystem.

You must start the MGS first as it manages the configurations. Beyond this, there are no ordering requirements to when a Target (MDT or OST) can be added to a filesystem. (However, there should be no client I/O at addition time, also known as "quiescent ost addition.")

For example, consider the following order of starting the servers.

1. Start the MGS (start mgs)

2. Mount OST 1 (mkfs, mount ost #1)

3. Mount the MDT (mkfs, mount mdt)

4. Mount OST 2 (mkfs, mount ost #2)

5. Mount the client (mount client)

6. Mount OST 3 (mkfs, mount ost #3)

The clients and the MDT are notified that there is a new OST on-line and they can use it immediately.



Note - The MGS must be running before any new servers are added to the filesystem. After the servers start the first time, they cache a local copy of their startup logs so they can restart with or without the MGS.

Currently, there is nothing actually visible on a server mount point (but df will show free space). Eventually, the mount point will probably look like a Lustre client.



4.2 Quick Configuration of Lustre

As already discussed, Lustre consists of four types of subsystems - a Management Server (MGS), a Metadata Target (MDT), Object Storage Targets (OSTs) and clients. All of these can co-exist on a single system or can run on different systems. Together, the OSSs and MDS together present a Logical Object Volume (LOV) which is an abstraction that appears in the configuration.

It is possible to set up the Lustre system with many different configurations by using the administrative utilities provided with Lustre. Some sample scripts are included in the directory where Lustre is installed. If you have installed the source code, the scripts are located in the lustre/tests sub-directory. These scripts enable quick setup of some simple, standard configurations.

The next section describes how to use these scripts to install a simple Lustre setup.

4.2.1 Simple Configurations

The procedures in this section describe how to set up simple Lustre configurations.

4.2.1.1 Module Setup

Make sure the modules (like LNET) are installed in the appropriate /lib/modules directory. The mkfs.lustre and mount.lustre utilities automatically load the correct modules.

1. Set up module options for networking should first be set up by adding the following line in /etc/modprobe.conf or modprobe.conf.local

# Networking options, see /sys/module/lnet/parameters NO \  ../lnet/parameters dir

2. Add the following line.

options lnet networks=tcp
# alias lustre llite -- remove this line from existing modprobe.conf
#(the llite module has been renamed to lustre) 
# end Lustre modules

The clients and the MDT are notified that there is a new OST on-line and are immediately able to use it.



Note - For detailed information on formatting an MDS or OST, see Options to Format MDT and OST Filesystems.


4.2.1.2 Making and Starting a Filesystem

Starting Lustre on MGS and MDT Node “mds16”

First, create an MDT for the ‘spfs’ filesystem that uses the /dev/sda disk. This MDT also acts as the MGS for the site.

$ mkfs.lustre --fsname spfs --mdt --mgs /dev/sda
 
Permanent disk data:
Target:spfs-MDTffff
Index:unassigned
Lustre FS:spfs
Mount type:ldiskfs
Flags:0x75
(MDT MGS needs_index first_time update)
Persistent mount opts: errors=remount- ro,iopen_nopriv,user_xattr
Parameters:
checking for existing Lustre data: not found
device size = 4096MB
formatting backing filesystem ldiskfs on /dev/sda
target name spfs-MDTffff
4k blocks0
options-J size=160 -i 4096 -I 512 -q -O dir_index -F
mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs-MDTffff -J \size=160 -i 4096 -I 512 -q -O dir_index -F /dev/sda
Writing CONFIGS/mountdata
$ mkdir -p /mnt/test/mdt
$ mount -t lustre /dev/sda /mnt/test/mdt
$ cat /proc/fs/lustre/devices
0 UP mgs MGS MGS 5
1 UP mgc MGC192.168.16.21@tcp bf0619d6-57e9-865c-551c- \06cc28f3806c 5
2 UP mdt MDS MDS_uuid 3
3 UP lov spfs-mdtlov spfs-mdtlov_UUID 4
4 UP mds spfs-MDT0000 spfs-MDT0000_UUID 3

Starting Lustre on any OST Node

Give OSTs the location of the MGS with the --mgsnode parameter.

$ mkfs.lustre --fsname spfs --ost --mgsnode=mds16@tcp0 /dev/sda
Permanent disk data:
Target:spfs-OSTffff
Index:unassigned
Lustre FS:spfs
Mount type:ldiskfs
Flags:0x72
(OST needs_index first_time update)
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=192.168.16.21@tcp
 
device size = 4096MB
formatting backing filesystem ldiskfs on /dev/sda
target namespfs-OSTffff
4k blocks0
options -J size=160 -i 16384 -I 256 -q -O dir_index -F
mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs-OSTffff -J \size=160 -i 16384 -I 256 -q -O dir_index -F /dev/sda
Writing CONFIGS/mountdata
$ mkdir -p /mnt/test/ost0
$ mount -t lustre /dev/sda /mnt/test/ost0
$ cat /proc/fs/lustre/devices
0 UP mgc MGC192.168.16.21@tcp 7ed113fe-dd48-8518-a387- 5c34eec6fbf4 5
1 UP ost OSS OSS_uuid 3
2 UP obdfilter spfs-OST0000 spfs-OST0000_UUID 5

Mounting Lustre on a client node

$ mkdir -p /mnt/testfs
$ mount -t lustre cfs21@tcp:0:/testfs /mnt/testfs

The MGS and the MDT can be run on separate devices. With the MGS on node ’mgs16’:

$ mkfs.lustre --mgs /dev/sda1
$ mkdir -p /mnt/mgs
$ mount -t lustre /dev/sda1 /mnt/mgs
$ mkfs.lustre --fsname=spfs --mdt --mgsnode=mgs16@tcp0 /dev/sda2
$ mkdir -p /mnt/test/mdt
$ mount -t lustre /dev/sda1 /mnt/test/mdt

If the MGS node has multiple interfaces (for example, mgs16 and 1@elan), only the client mount command has to change. The MGS NID specifier must be an appropriate nettype for the client (for instance, TCP client could use uml1@tcp0 and Elan client could use 1@elan). Alternatively, a list of all MGS NIDs can be provided and the client chooses the correct one.

$ mount -t lustre mgs16@tcp0,1@elan:/spfs /mnt/spfs

Reformat a device that has already been formatted with mkfs.lustre

$ mkfs.lustre --fsname=spfs --mdt --mgs --reformat /dev/sda1


Note - Error -19 = -ENODEV (from /usr/include/asm/errno.h) occurs when the client tries to connect to the server and fails to export the Lustre filesystem. The scenario can be: shut down the service, start up the service, fail over to another server node (if you have failover configured), or configuration errors.


4.2.1.3 Filesystem Name

The filesystem name is limited to 8 characters. We have encoded the filesystem and target information in the disk label, so that you can mount by label. This allows system administrators to move disks around without worrying about issues such as SCSI disk reordering or getting the /dev/device wrong for a shared target. Soon, filesystem naming will be made as fail-safe as possible. Currently, Linux disk labels are limited to 16 characters. To identify the target within the filesystem, 8 characters are reserved, leaving 8 characters for the filesystem name:

myfsname-MDT0000 or myfsname-OST0a19

An example of mount-by-label:

$ mount -t lustre -L testfs-MDT0000 /mnt/mdt


caution icon Caution - Mount-by-label should NOT be used in a multi-path environment.


Although the filesystem name is internally limited to 8 characters, you can mount the clients at any mount point, so filesystem users are not subjected to short names:

mount -t lustre uml1@tcp0:/shortfs /mnt/my-long-filesystem-name

4.2.1.4 Starting a Server Automatically

Starting Lustre only involves the mount command. Lustre servers can be added to /etc/fstab:

$ mount -l -t lustre
/dev/sda1 on /mnt/test/mdt type lustre (rw) [testfs-MDT0000]
/dev/sda2 on /mnt/test/ost0 type lustre (rw) [testfs-OST0000]
192.168.0.21@tcp:/testfs on /mnt/testfs type lustre (rw)

Add to /etc/fstab:

LABEL=testfs-MDT0000 /mnt/test/mdt lustre defaults,_netdev,noauto 0 0
LABEL=testfs-OST0000 /mnt/test/ost0 lustre defaults,_netdev,noauto 0 0

In general, it is wise to specify noauto and let your high-availability (HA) package manage when to mount the device. If you are not using failover, make sure that networking has been started before mounting a Lustre server. RedHat, SuSe, Debian (maybe others) use the _netdev flag to ensure that these disks are mounted after the network is up.

We are mounting by disk label here--the label of a device can be read with e2label. The label of a newly-formatted Lustre server ends in FFFF, meaning that it has yet to be assigned. The assignment takes place when the server is first started, and the disk label is updated.



caution icon Caution - Do not do this when the client and OSS are on the same node, as memory pressure between the client and OSS can lead to deadlocks.


Caution - Mount-by-label should NOT be used in a multi-path environment.

4.2.1.5 Stopping a Server

To stop a server:

$ umount /mnt/test/ost0

This preserves the state of the connected clients. The next time the server is started, it waits for clients to reconnect, and then goes through the recovery procedure.

If the -f (“force”) flag is given, the server will evict all clients and stop WITHOUT RECOVERY. The server will not wait for recovery upon restart. Any currently connected clients will get I/O errors until they reconnect.



Note - If you are using loopback devices, use the -d flag. This flag cleans up loop devices and can always be safely specified.


4.2.2 More Complex Configurations

In case of NID/node specification (a node is a server box), it may have multiple NIDs if it has multiple network interfaces. When a node is specified, generally all of its NIDs must be listed, delimited by commas (,) so other nodes can choose the NID appropriate to their own network interfaces. When multiple nodes are specified, they are delimited by a colon (:) or by repeating a keyword (--mgsnode= or --failnode=). To obtain all NIDs from a node (while LNET is running), run:

lctl list_nids

4.2.2.1 Failover

This example has a combined MGS/MDT failover pair on uml1 and uml2, and a OST failover pair on uml3 and uml4. There are corresponding Elan addresses on uml1 and uml2.

uml1> mkfs.lustre --fsname=testfs --mdt --mgs \ 
--failnode=uml2,2@elan /dev/sda1
uml1> mount -t lustre /dev/sda1 /mnt/test/mdt
uml3> mkfs.lustre --fsname=testfs --ost --failnode=uml4 \ 
--mgsnode=uml1,1@elan --mgsnode=uml2,2@elan /dev/sdb
uml3> mount -t lustre /dev/sdb /mnt/test/ost0
client> mount -t lustre uml1,1@elan:uml2,2@elan:/testfs /mnt/testfs
uml1> umount /mnt/mdt
uml2> mount -t lustre /dev/sda1 /mnt/test/mdt
uml2> cat /proc/fs/lustre/mds/testfs-MDT0000/recovery_status

Where multiple NIDs are specified, comma-separation (for example, uml2,2@elan) means that the two NIDs refer to the same host, and that Lustre needs to choose the "best" one for communication. Colon-separation (for example, uml1:uml2) means that the two NIDs refer to two different hosts, and should be treated as failover locations (Lustre tries the first one, and if that fails, it tries the second one.)



Note - If you have an MGS or MDT configured for failover, perform these steps:

1. On the OST, list the NIDs of all MGS nodes at mkfs time.

OST# mkfs.lustre --fsname sunfs --ost --mgsnode=10.0.0.1
--mgsnode=10.0.0.2 /dev/{device}

2. On the client, mount the filesystem.

client# mount -t lustre 10.0.0.1:10.0.0.2:/sunfs /cfs/client/


4.2.2.2 Mount with Inactive OSTs

Mounting a client or MDT with known down OSTs (specified targets are treated as "inactive")

client> mount -o exclude=testfs-OST0000 -t lustre uml1:/testfs\ /mnt/testfs
client> cat /proc/fs/lustre/lov/testfs-clilov-*/target_obd

To reactivate an inactive OST on a live client or MDT, use lctl activate on the OSC device, for example:

lctl --device 7 activate


Note - A colon-separated list can also be specified. For example,
exclude=testfs-OST0000:testfs-OST0001.


4.2.2.3 Finding Nodes in the Lustre Filesystem

You can get the list of servers participating in each filesystem by using following commands.

cfs21:/tmp# cat /proc/fs/lustre/mgs/MGS/live/* 
fsname: lustre 
flags: 0x0     gen: 26 
lustre-MDT0000 
lustre-OST0000 
lustre-OST0001 

You can get the names of OSTs from MDT by using following commands.

cfs21:/tmp# cat /proc/fs/lustre/lov/lustre-mdtlov/target_obd 
0: lustre-OST0000_UUID ACTIVE 
1: lustre-OST0001_UUID ACTIVE 

4.2.2.4 Without Lustre Service

Only start the MGS or MGC. Do not start the target server (for example, if you do not want to start the MDT for a combined MGS/MDT)

$ mount -t lustre -L testfs-MDT0000 -o nosvc /mnt/test/mdt

4.2.2.5 Failout

Designate an OST as a "failout", so clients receive errors after a timeout instead of waiting for recovery:

$ mkfs.lustre --fsname=testfs --ost --mgsnode=uml1 \ -- param="failover.mode=failout" /dev/sdb

4.2.2.6 Running Multiple Lustre Filesystems

The default filesystem name created by mkfs.lustre is lustre. For a different filesystem name, specify mkfs.lustre --fsname=foo. The MDT, OSTs and clients that comprise a single filesystem must share the same name. For example:

foo-MDT0000
foo-OST0000
foo-OST0001
client mount command: mount -t lustre mgsnode:/foo /mnt/mountpoint

The maximum length of the filesystem name is 8 characters.

The MGS is universal; there is only one MGS per installation, not per filesystem.



Note - There is only one filesystem per MDT. Therefore, specify --mdt --mgs on one, and --mdt --mgsnode=<mgsnodenid> on the others.


An installation with two filesystems could look like this:

mgsnode# mkfs.lustre --mgs /dev/sda
mdtfoonode# mkfs.lustre --fsname=foo --mdt --mgsnode=mgsnode@tcp0 /dev/sda
ossfoonode# mkfs.lustre --fsname=foo --ost --mgsnode=mgsnode@tcp0 /dev/sda
ossfoonode# mkfs.lustre --fsname=foo --ost --mgsnode=mgsnode@tcp0 /dev/sdb
mdtbarnode# mkfs.lustre --fsname=bar --mdt --mgsnode=mgsnode@tcp0 /dev/sda
ossbarnode# mkfs.lustre --fsname=bar --ost --mgsnode=mgsnode@tcp0 /dev/sda
ossbarnode# mkfs.lustre --fsname=bar --ost --mgsnode=mgsnode@tcp0 /dev/sdb

Client mount for foo:

mount -t lustre mgsnode@tcp0:/foo /mnt/work

Client mount for bar:

mount -t lustre mgsnode@tcp0:/bar /mnt/scratch

4.2.3 Other Configuration Tasks

This section describes other Lustre configuration tasks.

4.2.3.1 Removing an OST

In Lustre 1.6, an OST can be permanently removed from a filesystem. Any files that have stripes on the removed OST will, in the future, return EIO.

$ mgs> lctl --device <devno> conf_param testfs-OST0001.osc.active=0

This tells any clients of the OST that it should not be contacted; the OSTs current state is irrelevant.

To remove an OST:

1. Deactivate the OST (make it read-only) on the MDS so no new objects are allocated to it.

2. Use lfs find to discover all files that have objects residing on the deactivated OST.

3. Copy these files to a new location and then move them back to their original location to force their object re-creation on the active OSTs and the object deletion on the OST to be removed.

To restore the OST:

1. Make sure the OST is running.

2. Run this command:

$ mgs> lctl --device <devno> conf_param testfs-OST0001.osc.active=1

4.2.3.2 Running the Writeconf Command

To run writeconf, first remove all existing configuration files for a filesystem. Use the writeconf command on an MDT to erase all the configuration logs for the filesystem. The logs are regenerated only as servers restart; therefore all servers must be restarted before clients can access filesystem data. The logs are regenerated as in a new filesystem; old settings from lctl conf_param are lost, and current server NIDs are used. Only use the writeconf command if:

To run the writeconf command:

1. Unmount all the clients and servers.

2. With every server disk, run:

$ mdt> tunefs.lustre --writeconf /dev/sda1

3. Remount all servers. You must mount the MDT first.



caution icon Caution - Running the writeconf command on the MDS will erase all pools information (as well as any other parameters set via lctl conf_param). We recommend that the pools definitions (and conf_param settings) be executed via a script, so they can be reproduced easily after a writeconf is performed.


4.2.3.3 Changing a Server NID

To change a server NID:

1. Update the LNET configuration in /etc/modprobe.conf so the lctl list_nids is correct.

2. Regenerate the configuration logs for every affected filesystem using the --writeconf flag to tunefs.lustre, as shown in the second step of the section.

3. If the MGS NID is also changing, communicate the new MGS location to each server. Type:

tunefs.lustre --erase-param --mgsnode=<new_nid(s)> --writeconf \ /dev/...

4.2.3.4 Aborting Recovery

When starting a target, abort the recovery process. Type:

$ mount -t lustre -L testfs-MDT0000 -o abort_recov /mnt/test/mdt


Note - The recovery process is blocked until all OSTs are available.



4.3 Building from Source

This section describes how to build Lustre from source code.

4.3.1 Building Your Own Kernel

If you are using non-standard hardware or Lustre Support has asked you to apply a patch, you need to build your own kernel. Lustre requires some changes to the core Linux kernel. These changes are organized in a set of patches in the kernel_patches directory of the Lustre repository. If you are building your kernel from the source code, then you need to apply the appropriate patches.

Managing patches for the kernels is a very involved process, because most patches are intended to work with several kernels. We recommend that you use the Quilt package developed by Andreas Gruenbacher, as it simplifies the process considerably. Patch management with Quilt works as follows:

1. A series file lists a collection of patches.

2. The patches in a series form a stack.

3. Using Quilt, you push and pop the patches.

4. You then edit and refresh (update) the patches in the stack that is being managed with Quilt.

5. You can then revert inadvertent changes and fork or clone the patches and conveniently show the difference in work (before and after).

4.3.1.1 Selecting a Patch Series

Depending on the kernel being used, a different series of patches needs to be applied. A collection of different patch series files are maintained for the various supported kernels in this directory: lustre/kernel_patches/series/.[2]

For example, the lustre/kernel_patches/series/rh-2.4.20 file lists all patches that should be applied to the Red Hat 2.4.20 kernel to build a Lustre-compatible kernel.

The current set of all the supported kernels and their corresponding patch series can be found in the lustre/kernel_patches/which_patch file.

4.3.1.2 Installing Quilt

A variety of Quilt packages (RPMs, SRPMs and tarballs) are available from various sources. We recommend that you use a recent version of Quilt. If possible, use a Quilt package from your distribution vendor.

If you cannot find an appropriate Quilt package or fulfill its dependencies, we suggest that you build Quilt from the tarball. You can download the tarball from the main Quilt website:

http://savannah.nongnu.org/projects/quilt



Note - The latest Lustre release works with the latest Linux kernel distribution. If you use the pre-patched kernel from an older Lustre release or the kernel patches against a different kernel, you can build a more recent Lustre release against it.


4.3.1.3 Preparing the Kernel Tree Using Quilt

To prepare the kernel tree to use Quilt:

1. After acquiring the Lustre source (CVS or tarball) and choosing a series file to match your kernel sources, choose a kernel config file.

The lustre/kernel_patches/kernel_configs folder contains supported .config files, which are named to indicate which kernel and architecture with which they are associated. For example, the configuration file for the 2.6.9 kernel shipped with RHEL 4 (suitable for x86_64 SMP systems) is:
kernel-2.6.9-2.6-rhel4-x86_64-smp.config

2. Unpack the appropriate kernel source tree.

This manual assumes that the resulting source tree (referred to as the destination tree) is in /tmp/kernels/linux-2.6.9

You are ready to use Quilt to manage the patching process for your kernel.

3. Perform the following commands to set up the necessary symlinks between the Lustre kernel patches and your kernel sources (assuming the Lustre sources are unpacked under /tmp/lustre-1.4.7.3 and you have chosen the 2.6-rhel4 series):

$ cd /tmp/kernels/linux-2.6.9
$ rm -f patches series
$ ln -s /tmp/lustre-1.5.97/lustre/kernel_patches/series/2.6-\ rhel4.series ./series
$ ln -s /tmp/lustre-1.5.97/lustre/kernel_patches/patches .

4. You can now use Quilt to apply all patches in the chosen series to your kernel sources. Run:

$ cd /tmp/kernels/linux-2.6.9
$ quilt push -av

If the right series files are chosen, and the patches and the kernel sources are up-to-date, the patched destination Linux tree should be able to act as a base Linux source tree for Lustre.

You do not need to compile the patched Linux source in order to build Lustre from it. However, you must compile the same Lustre-patched kernel and then boot it on any node on which you intend to run the version of Lustre being built using this patched kernel source.

4.3.2 Building Lustre

The most recent versions of Lustre are available for download:

The following set of packages are available for each supported Linux distribution and architecture. The files use the following naming convention:

kernel-smp-<kernel version>_lustre.<lustre version>.<arch>.rpm

This is an example of binary packages for version 1.5.97:

Use standard RPM commands to install the binary packages:

$ rpm -ivh kernel-lustre-smp-2.6.9-42.0.3.EL_lustre.1.5.97.i686.rpm
$ rpm -ivh lustre-1.5.97-2.6.9_42.0.3.EL_lustre.1.5.97smp.i686.rpm
$ rpm -ivh lustre-modules-1.5.97-2.6.9_42.0.3.EL_lustre.1.5.97smp.i686.rpm

This is an example of Source packages:



Note - Kernel-source and Lustre-source packages are provided in case you need to build external kernel modules or use additional network types. They are not required to run Lustre.


Once you have your Lustre source tree, run these commands to build Lustre.

$ cd <path to kernel tree>
$ cp /boot/config-'uname -r' .config
$ make oldconfig || make menuconfig
$ make include/asm
$ make include/linux/version.h
$ make SUBDIRS=scripts
$ make include/linux/utsrelease.h
$ make dep

To configure Lustre and to build Lustre RPMs, go to the Lustre source directory and run:

$ ./configure --with-linux=<path to kernel tree>
$ make rpms

This creates a set of .rpms in /usr/src/redhat/RPMS/<arch> with a date-stamp appended (the SUSE path is /usr/src/packages).

Example:

lustre-1.5.97-\2.6.9_42.xx.xx.EL_lustre.1.5.97.custom_200609072009.i686.rpm
lustre-debuginfo-1.5.97-\2.6.9_42.xx.xx.EL_lustre.1.5.97.custom_200609072009.i686.rpm
lustre-modules-1.5.97-\2.6.9_42.xx.xxEL_lustre.1.5.97.custom_200609072009.i686.rpm
lustre-source-1.5.97-\2.6.9_42.xx.xx.EL_lustre.1.5.97.custom_200609072009.i686.rpm

Change directory (cd) into the kernel source directory and run:

$ make rpm

This creates a kernel RPM suitable for the installation.

Example:

kernel-2.6.95.0.3.EL_lustre.1.5.97custom-1.i386.rpm

4.3.2.1 Configuration Options

Lustre supports several different features and packages that extend the core functionality of Lustre. These features/packages can be enabled at the build time by issuing appropriate arguments to the configure command. A complete list of supported features and packages can be obtained by issuing the command ./configure -help in the Lustre source directory. The config files matching the kernel version are in the configs/ directory of the kernel source. Copy one to .config at the root of the kernel tree.

4.3.2.2 Liblustre

The Lustre library client, liblustre, relies on libsysio, which is a library that provides POSIX-like file and name space support for remote filesystems from the application program address space. Libsysio can be obtained at the SourceForge website:

http://sourceforge.net/projects/libsysio/



caution icon Caution - Remember that liblustre is not for general use. It was created to work with specific hardware (Cray), and should NEVER be used with other hardware.


Development of libsysio has continued ever since it was first targeted for use with Lustre. First, check out the b_lustre branch from the libsysio repository. This gives the version of libsysio compatible with Lustre.

To build libsysio, run:

$ sh autogen.sh 
$ ./configure --with-sockets 
$ make 

To build liblustre, run:

$ ./configure --with-lib -with-sysio=/path/to/libsysio/source
$ make
Compiler Choice

The compiler must be greater than GCC version 3.3.4. Currently, GCC v4.0 is not supported. GCC v3.3.4 has been used to successfully compile available pre-packaged releases, and it is the only officially-supported compiler. You may have mixed results with other compilers or even with other GCC versions.



Note - GCC v3.3.4 was used to build 2.6 series kernels.


4.3.3 Building from Source

Currently, the distributed kernels do not include third-party InfiniBand modules. Lustre packages cannot include IB network drivers for Lustre, however, Lustre does distribute the source code. Build your InfiniBand software stack against the provided kernel, and then build new Lustre packages. This includes following procedures.

InfiniBand

To build Lustre with Voltaire InfiniBand sources, add:

--with-vib=<path-to-voltaire-sources>

as an argument to the configure script.

To configure Lustre, use:

--nettype vib --nid <IPoIB address>
OpenIB generation 1 / Mellanox Gold

To build Lustre with OpenIB InfiniBand sources, add:

--with-openib=<path_to_openib sources>

as an argument to the configure script.B

To configure Lustre, use:

--nettype openib --nid <IPoIB address>
Silverstorm

To build Silverstorm with Lustre, configure Lustre with:

--iib=<path to silverstorm sources>
OpenIB 1.0


Note - Currently (v1.4.5), the Voltaire IB module (kvibnal) does not work on the Altix system. This is due to hardware differences in the Altix system.



4.4 Building a Lustre Source Tarball

This section describes how to build tarballs from RPMs.

4.4.1 Lustre Source Tarball from Lustre Source RPM

To build a proper Lustre source tarball from the Lustre source RPM:

1. Install the RPM.

2. Configure the resulting Lustre tree.

3. Run 'make dist'

This produces a proper Lustre tarball. Untar it and name the resulting directory: lustre-<extraversion>.

The lbuild script requires a working directory. This directory must be empty prior to starting lbuild. If the build fails, clean out the working directory before attempting to restart.

The following example shows a local build with no downloading. The name of the 'target' (kernel version) must match a file in lustre/kernel_patches/targets. If you do not specify --target-arch (the hardware platform), then all architectures will be built.

This example is for the RHEL 2.6 kernel.

$ lustre/build/lbuild	--extraversion=1.4.9 \
					--target=2.6-rhel4
					--target-archs=i686 \
					--release \
					--kerneldir=/path_to_tarball/ \
					--stage=/path_to_working_dir/ \
					--lustre=/path_to_lustre_tarball/ \
					--nodownload

4.4.2 Lustre Source Tarball from CVS

The following example shows how to build a tarball from CVS[3], and includes the building of additional network drivers (gm and vib).

You must properly configure the network driver tree prior to starting lbuild. The network drivers are compiled as part of the Lustre build. Options after the '--' separator are passed directly to the Lustre ./configure script.

In the example:

Replace CVSROOT with the proper CVS string.

--tag is the CVS tag for the version you are building.

$ lustre/build/lbuild		--extraversion=1.4.7.1 \ 
				--target=2.6-rhel4
				--target-archs=i686 \
				--release \
				--kerneldir=/path_to_tarball/ \
				--stage=/path_to_working_dir/ \
				--d:ext:$CVROOT
				--tag=b_release_1_4_7 \
				--disable-datestamp \
				-- /* following options will be passed to lustre */ \
				--with-gm=/path_to_gm/gm-2.1.23_Linux \
				--with-vib=/path_to_vib/ibhost-3.5.0_13

 



Tip - After installing Lustre, you can download and install service tags, which enable automatic discovery and tracking of the system components and better management of your Lustre environment. For more information, see Service Tags.



1 (Footnote) Additionally, a few bytes outside the journal are used to create accounting data for Lustre.
2 (Footnote) This directory is in the Lustre tarball.
3 (Footnote) CVS is not generally available. If you need CVS access or additional information, contact the Lustre Group.