Sun Logo


Lustre 1.6 Operations Manual

820-3681-10



Contents

Part I Lustre Architecture

1. Introduction to Lustre

1.1 Lustre File System

1.2 Lustre Components

1.2.1 MGS

1.2.2 MDT

1.2.3 MDS

1.2.4 OSTs

1.2.5 OSS

1.2.6 Lustre Clients

1.3 Files in the Lustre File System

1.3.1 Lustre File System and Striping

1.3.2 Lustre Storage

1.3.3 Lustre System Capacity

1.4 Lustre Configurations

1.5 Lustre Networking

1.6 Lustre Failover and Rolling Upgrades

1.7 Additional Lustre Features

2. Understanding Lustre Networking

2.1 Introduction to LNET

2.2 Supported Network Types

2.3 Important Terms

Part II Lustre Administration

3. Prerequisites

3.1 Preparing to Install Lustre

3.1.1 How to Get Lustre

3.1.2 Supported Configurations

3.2 Using a Pre-Packaged Lustre Release

3.2.1 Choosing a Pre-Packaged Kernel

3.2.2 Lustre Tools

3.2.3 Other Required Software

3.3 Environmental Requirements

3.3.1 SSH Access

3.3.2 Consistent Clocks

3.3.3 Universal UID/GID

3.3.4 Choosing a Proper Kernel I/O Scheduler

3.3.5 Changing the I/O Scheduler

3.4 Memory Requirements

3.4.1 Determining the MDS’s Memory

3.4.2 OSS Memory Requirements

4. Lustre Installation

4.1 Installing Lustre

4.1.1 MountConf

4.2 Quick Configuration of Lustre

4.2.1 Simple Configurations

4.2.2 More Complex Configurations

4.2.3 Other Configuration Tasks

4.3 Building from Source

4.3.1 Building Your Own Kernel

4.3.2 Building Lustre

4.3.3 Building from Source

4.4 Building a Lustre Source Tarball

4.4.1 Lustre Source Tarball from Lustre Source RPM

4.4.2 Lustre Source Tarball from CVS

5. Configuring the Lustre Network

5.1 Designing Your Lustre Network

5.1.1 Identify All Lustre Networks

5.1.2 Identify Nodes to Route Between Networks

5.1.3 Identify Network Interfaces to Include/Exclude from LNET

5.1.4 Determine Cluster-wide Module Configuration

5.1.5 Determine Appropriate Mount Parameters for Clients

5.2 Configuring Your Lustre Network

5.2.1 Module Parameters

5.2.2 Module Parameters - Routing

5.2.3 Downed Routers

5.3 Starting and Stopping LNET

5.3.1 Starting LNET

5.3.2 Stopping LNET

6. Configuring Lustre - Examples

6.1 Simple TCP Network

6.1.1 Lustre with Combined MGS/MDT

6.1.2 Lustre with Separate MGS and MDT

7. More Complicated Configurations

7.1 Multihomed Servers

7.1.1 Modprobe.conf

7.1.2 Start Servers

7.1.3 Start Clients

7.2 Elan to TCP Routing

7.2.1 Modprobe.conf

7.2.2 Start servers

7.2.3 Start clients

7.3 Load Balancing with Infiniband

7.3.1 Modprobe.conf

7.3.2 Start servers

7.3.3 Start clients

7.4 Multi-Rail Configurations with LNET

8. Failover

8.1 What is Failover?

8.1.1 The Power Management Software

8.1.2 Power Equipment

8.1.3 Heartbeat

8.1.4 Connection Handling During Failover

8.1.5 Roles of Nodes in a Failover

8.2 OST Failover

8.3 MDS Failover

8.4 Configuring MDS and OSTs for Failover

8.4.1 Starting/Stopping a Resource

8.4.2 Active/Active Failover Configuration

8.4.3 Hardware Requirements for Failover

8.5 Setting Up Failover with Heartbeat V1

8.5.1 Installing the Software

8.6 Using MMP

8.7 Setting Up Failover with Heartbeat V2

8.7.1 Installing the Software

8.7.2 Configuring the Hardware

8.7.3 Operation

8.8 Considerations with Failover Software and Solutions

9. Configuring Quotas

9.1 Working with Quotas

9.1.1 Enabling Disk Quotas

9.1.2 Creating Quota Files and Quota Administration

9.1.3 Resetting the Quota

9.1.4 Quota Allocation

9.1.5 Known Issues with Quotas

10. RAID

10.1 Considerations for Backend Storage

10.1.1 Reliability

10.1.2 Selecting Storage for the MDS and OSS

10.1.3 Understanding Double Failures with Software and Hardware RAID5

10.1.4 Performance Considerations

10.1.5 Formatting

10.2 Insights into Disk Performance Measurement

10.2.1 Sample Graphs

10.3 Creating an External Journal

11. Kerberos

11.1 What is Kerberos?

11.2 Lustre Setup with Kerberos

11.2.1 Configuring Kerberos for Lustre

11.2.2 Types of Lustre-Kerberos Flavors

12. Bonding

12.1 Network Bonding

12.2 Requirements

12.3 Using Lustre with Multiple NICs versus Bonding NICs

12.4 Bonding Module Parameters

12.5 Setting Up Bonding

12.5.1 Examples

12.6 Configuring Lustre with Bonding

12.6.1 Bonding References

13. Upgrading Lustre

13.1 Lustre Interoperability

13.2 Upgrading Lustre from 1.4.12 to 1.6.4

13.2.1 Upgrade Requirements

13.2.2 Supported Upgrade Paths

13.2.3 Starting Clients

13.2.4 Upgrading a Single File system

13.2.5 Upgrading Multiple File Systems with a Shared MGS

13.3 Upgrading Lustre from 1.6.3 to 1.6.4

13.4 Downgrading Lustre from 1.6.4 to 1.4.12

13.4.1 Downgrade Requirements

13.4.2 Downgrading a File System

14. Lustre SNMP Module

14.1 Installing the Lustre SNMP Module

14.2 Building the Lustre SNMP Module

14.3 Using the Lustre SNMP Module

15. Backup and Restore

15.1 Lustre Backups

15.1.1 Client File System-level Backups

15.1.2 Performing Device-level Backups

15.1.3 Performing File-level Backups

15.2 Restoring from a File-level Backup

16. POSIX

16.1 Installing POSIX

16.2 Running POSIX Tests Against Lustre

16.3 Isolating and Debugging Failures

17. Benchmarking

17.1 Bonnie++ Benchmark

17.2 IOR Benchmark

17.3 IOzone Benchmark

18. Lustre Recovery

18.1 Recovering Lustre

18.2 Types of Failure

18.2.1 Client Failure

18.2.2 MDS Failure (and Failover)

18.2.3 OST Failure

18.2.4 Network Partition

Part III Lustre Tuning, Monitoring and Troubleshooting

19. Lustre I/O Kit

19.1 Lustre I/O Kit Description and Prerequisites

19.1.1 Downloading an I/O Kit

19.1.2 Prerequisites to Using an I/O Kit

19.2 Running I/O Kit Tests

19.2.1 sgpdd_survey

19.2.2 obdfilter_survey

19.2.3 ost_survey

19.3 PIOS Test Tool

19.3.1 Synopsis

19.3.2 PIOS I/O Modes

19.3.3 PIOS Parameters

19.3.4 PIOS Examples

19.4 LNET Self-Test

19.4.1 Introduction to LNET Self-Test

19.4.2 LNET Self-Test Concepts

19.4.3 LNET Self-Test Commands

20. LustreProc

20.1 /proc Entries for Lustre

20.1.1 Finding Lustre

20.1.2 Lustre Timeouts/ Debugging

20.1.3 Adaptive Timeouts in Lustre

20.1.4 LNET Information

20.1.5 Free Space Distribution

20.2 Lustre I/O Tunables

20.2.1 Client I/O RPC Stream Tunables

20.2.2 Watching the Client RPC Stream

20.2.3 Client Read-Write Offset Survey

20.2.4 Client Read-Write Extents Survey

20.2.5 Watching the OST Block I/O Stream

20.2.6 Mechanics of Lustre Readahead

20.2.7 mballoc History

20.2.8 mballoc3 Tunables

20.2.9 Locking

20.3 Debug Support

20.3.1 RPC Information for Other OBD Devices

21. Lustre Tuning

21.1 Module Options

21.1.1 MDS Threads

21.2 LNET Tunables

21.3 Options to Format MDT and OST Filesystems

21.3.1 Planning for Inodes

21.3.2 Calculating MDT Size

21.3.3 Overriding Default Formatting Options

21.4 Network Tuning

21.5 DDN Tuning

21.5.1 Setting Readahead and MF

21.5.2 Setting Segment Size

21.5.3 Setting Write-Back Cache

21.5.4 Setting maxcmds

21.5.5 Further Tuning Tips

21.6 Large-Scale Tuning for Cray XT and Equivalents

21.6.1 Network Tunables

21.7 Lockless I/O Tunables

22. Lustre Troubleshooting Tips

22.1 Lustre Error Messages and Logs

22.1.1 Lustre Error Messages

22.1.2 Lustre Logs

22.2 Lustre Performance Tips

22.2.1 Setting SCSI I/O Sizes

22.2.2 Write Performance Better Than Read Performance

22.2.3 OST Object is Missing or Damaged

22.2.4 OSTs Become Read-Only

22.2.5 Identifying a Missing OST

22.2.6 Changing Parameters

22.2.7 Viewing Parameters

22.2.8 Default Striping

22.2.9 Erasing a Filesystem

22.2.10 Reclaiming Reserved Disk Space

22.2.11 Considerations in Connecting a SAN with Lustre

22.2.12 Handling/Debugging "Bind: Address already in use" Error

22.2.13 Replacing An Existing OST or MDS

22.2.14 Handling/Debugging Error "- 28"

22.2.15 Triggering Watchdog for PID NNN

22.2.16 Handling Timeouts on Initial Lustre Setup

22.2.17 Handling/Debugging "LustreError: xxx went back in time"

22.2.18 Lustre Error: "Slow Start_Page_Write"

22.2.19 Drawbacks in Doing Multi-client O_APPEND Writes

22.2.20 Slowdown Occurs During Lustre Startup

22.2.21 Log Message ‘Out of Memory’ on OST

22.2.22 Number of OSTs Needed for Sustained Throughput

23. Lustre Debugging

23.1 Lustre Debug Messages

23.1.1 Format of Lustre Debug Messages

23.2 Tools for Lustre Debugging

23.2.1 Debug Daemon Option to lctl

23.2.2 Controlling the Kernel Debug Log

23.2.3 The lctl Tool

23.2.4 Finding Memory Leaks

23.2.5 Printing to /var/log/messages

23.2.6 Tracing Lock Traffic

23.2.7 Sample lctl Run

23.2.8 Adding Debugging to the Lustre Source Code

23.2.9 Debugging in UML

23.3 Using Strace for Troubleshooting

23.4 Looking at Disk Content

23.4.1 Determine the Lustre UUID of an OST

23.4.2 Tcpdump

23.5 Ptlrpc Request History

Part IV Lustre for Users

24. Free Space and Quotas

24.1 Querying Filesystem Space

24.2 Using Quota

25. Striping and I/O Options

25.1 File Striping

25.1.1 Advantages of Striping

25.1.2 Disadvantages of Striping

25.1.3 Stripe Size

25.2 Displaying Files and Directories with lfs getstripe

25.3 lfs setstripe - Setting File Layouts

25.3.1 Changing Striping for a Subdirectory

25.3.2 Using a Specific Striping Pattern/File Layout for a Single File

25.4 Free Space Management

25.4.1 Round-Robin Allocator

25.4.2 Weighted Allocator

25.4.3 Adjusting the Weighting Between Free Space and Location

25.5 Performing Direct I/O

25.5.1 Making Filesystem Objects Immutable

25.6 Other I/O Options

25.6.1 End-to-End Client Checksums

25.7 Striping Using llapi

26. Lustre Security

26.1 Using ACLs

26.1.1 How ACLs Work

26.1.2 Lustre ACLs

26.1.3 Examples

27. Lustre Operating Tips

27.1 Expanding the Filesystem by Adding OSTs

27.2 A Simple Data Migration Script

27.3 Adding Multiple SCSI LUNs on Single HBA

27.4 Failures While Running a Client and an OST on the Same Machine

27.5 Improving Lustre Metadata Performance While Using Large Directories

Part V Reference

28. User Utilities (man1)

28.1 lfs

28.2 lfsck

28.3 Mount

28.4 Handling Timeouts

29. Lustre Programming Interfaces (man2)

29.1 User/Group Cache Upcall

29.1.1 Name

29.1.2 Description

29.1.3 Parameters

29.1.4 Data structures

30. Setting Lustre Properties (man3)

30.1 Using llapi

30.1.1 llapi_file_create

30.1.2 llapi_file_get_stripe

30.1.3 llapi_file_open

31. Configuration Files and Module Parameters (man5)

31.1 Introduction

31.2 Module Options

31.2.1 LNET Options

31.2.2 SOCKLND Kernel TCP/IP LND

31.2.3 QSW LND

31.2.4 RapidArray LND

31.2.5 VIB LND

31.2.6 OpenIB LND

31.2.7 Portals LND (Linux)

31.2.8 Portals LND (Catamount)

31.2.9 MX LND

32. System Configuration Utilities (man8)

32.1 mkfs.lustre

32.2 tunefs.lustre

32.3 lctl

32.4 mount.lustre

32.5 New Utilities in Lustre 1.6

32.5.1 lustre_rmmod.sh

32.5.2 e2scan

32.5.3 Utilities to Manage Large Clusters

32.5.4 Application Profiling Utilities

32.5.5 More /proc Statistics for Application Profiling

32.5.6 Testing / Debugging Utilities

32.5.7 Flock Feature

33. System Limits

33.1 Maximum Stripe Count

33.2 Maximum Stripe Size

33.3 Minimum Stripe Size

33.4 Maximum Number of OSTs and MDTs

33.5 Maximum Number of Clients

33.6 Maximum Size of a Filesystem

33.7 Maximum File Size

33.8 Maximum Number of Files or Subdirectories in a Single Directory

33.9 MDS Space Consumption

33.10 Maximum Length of a Filename and Pathname

33.11 Maximum Number of Open Files for Lustre Filesystems

33.12 OSS RAM Size for a Single OST

A. Feature List

B. Task List

C. Version Log

D. Lustre Knowledge Base

Glossary

Index