Sun Logo


Lustre 1.6 Operations Manual

820-3681-10



Contents

Part I Lustre Architecture

1. Introduction to Lustre

1.1 Introducing the Lustre File System

1.1.1 Lustre Key Features

1.2 Lustre Components

1.2.1 MDS

1.2.2 MDT

1.2.3 OSS

1.2.4 OST

1.2.5 Lustre Clients

1.2.6 LNET

1.2.7 MGS

1.3 Lustre Systems

1.4 Files in the Lustre File System

1.4.1 Lustre File System and Striping

1.4.2 Lustre Storage

1.4.3 Lustre System Capacity

1.5 Lustre Configurations

1.6 Lustre Networking

1.7 Lustre Failover and Rolling Upgrades

1.8 Additional Lustre Features

2. Understanding Lustre Networking

2.1 Introduction to LNET

2.2 Supported Network Types

2.3 Designing Your Lustre Network

2.3.1 Identify All Lustre Networks

2.3.2 Identify Nodes to Route Between Networks

2.3.3 Identify Network Interfaces to Include/Exclude from LNET

2.3.4 Determine Cluster-wide Module Configuration

2.3.5 Determine Appropriate Mount Parameters for Clients

2.4 Configuring LNET

2.4.1 Module Parameters

2.4.2 Module Parameters - Routing

2.4.3 Downed Routers

2.5 Starting and Stopping LNET

2.5.1 Starting LNET

2.5.2 Stopping LNET

Part II Lustre Administration

3. Lustre Installation

3.1 Preparing to Install Lustre

3.1.1 Supported Operating System, Platform and Interconnect

3.1.2 Required Tools and Utilities

3.1.3 High-Availability Software

3.1.4 Debugging Tools

3.1.5 Environmental Requirements

3.1.6 Memory Requirements

3.2 Installing Lustre from RPMs

3.3 Installing Lustre from Source Code

3.3.1 Patching the Kernel

3.3.2 Create and Install the Lustre Packages

3.3.3 Installing Lustre with a Third-Party Network Stack

4. Configuring Lustre

4.1 Configuring Lustre

4.2 Basic Lustre Administration

4.2.1 Filesystem Name

4.2.2 Starting a Server

4.2.3 Stopping a Server

4.2.4 Working with Inactive OSTs

4.2.5 Finding Nodes in the Lustre Filesystem

4.2.6 Start a Server Without Lustre Service

4.2.7 Specifying Failout Mode for an OST

4.2.8 Running Multiple Lustre Filesystems

4.2.9 Running the Writeconf Command

4.2.10 Removing an OST

4.2.11 Changing a Server NID

4.2.12 Aborting Recovery

4.3 More Complex Configurations

4.3.1 Failover

4.4 Operational Scenarios

4.4.1 Stopping a Service

4.4.2 Forcing Failover

4.4.3 Re-addressing a Failover Node

4.4.4 Local Mounts through Network

4.4.5 Start Lustre on a Client or Server Node, Ignoring the Management Network

4.4.6 Mounting a Lustre Server but No Service

4.4.7 Adding an OSS/MDS to an Existing Filesystem

4.4.8 Security Policies on Some Interface for the MDS

4.4.9 Configure a Client with a Persistent Write-Back Cache

4.4.10 Configure a Replicating Proxy Cluster

4.4.11 OSS Pools

4.4.12 Target Configuration

4.4.13 Echo Client and Single OST Server

4.4.14 Striped OST Echo Server

4.4.15 Long Term Failure and Removal of OSS Targets

4.4.16 Migration

4.5 Configuration State Management

4.6 Lustre Configuration Utilities

4.6.1 mkfs.lustre

4.6.2 mount.lustre

4.6.3 lfs listtargets

4.6.4 lfs export

4.6.5 lfs migrate

5. Service Tags

5.1 Introduction to Service Tags

5.2 Using Service Tags

5.2.1 Installing Service Tags

5.2.2 Discovering and Registering Lustre Components

5.2.3 Information Registered with Sun

6. Configuring Lustre - Examples

6.1 Simple TCP Network

6.1.1 Lustre with Combined MGS/MDT

6.1.2 Lustre with Separate MGS and MDT

7. More Complicated Configurations

7.1 Multihomed Servers

7.1.1 Modprobe.conf

7.1.2 Start Servers

7.1.3 Start Clients

7.2 Elan to TCP Routing

7.2.1 Modprobe.conf

7.2.2 Start servers

7.2.3 Start clients

7.3 Load Balancing with InfiniBand

7.3.1 Modprobe.conf

7.3.2 Start servers

7.3.3 Start clients

7.4 Multi-Rail Configurations with LNET

8. Failover

8.1 What is Failover?

8.1.1 The Power Management Software

8.1.2 Power Equipment

8.1.3 Heartbeat

8.1.4 Connection Handling During Failover

8.1.5 Roles of Nodes in a Failover

8.2 OST Failover

8.3 MDS Failover

8.4 Configuring MDS and OSTs for Failover

8.4.1 Configuring Lustre for Failover

8.4.2 Starting/Stopping a Resource

8.4.3 Active/Active Failover Configuration

8.4.4 Hardware Requirements for Failover

8.5 Setting Up Failover with Heartbeat V1

8.5.1 Installing the Software

8.6 Using MMP

8.7 Setting Up Failover with Heartbeat V2

8.7.1 Installing the Software

8.7.2 Configuring the Hardware

8.7.3 Operation

8.8 Considerations with Failover Software and Solutions

9. Configuring Quotas

9.1 Working with Quotas

9.1.1 Enabling Disk Quotas

9.1.2 Creating Quota Files and Quota Administration

9.1.3 Resetting the Quota

9.1.4 Quota Allocation

9.1.5 Known Issues with Quotas

9.1.6 Lustre Quota Statistics

10. RAID

10.1 Considerations for Backend Storage

10.1.1 Selecting Storage for the MDS and OSS

10.1.2 Reliability Best Practices

10.1.3 Understanding Double Failures with Hardware and Software RAID5

10.1.4 Performance Tradeoffs

10.1.5 Formatting

10.2 Insights into Disk Performance Measurement

10.3 Lustre Software RAID Support

11. Kerberos

11.1 What is Kerberos?

11.2 Lustre Setup with Kerberos

11.2.1 Configuring Kerberos for Lustre

11.2.2 Types of Lustre-Kerberos Flavors

12. Bonding

12.1 Network Bonding

12.2 Requirements

12.3 Using Lustre with Multiple NICs versus Bonding NICs

12.4 Bonding Module Parameters

12.5 Setting Up Bonding

12.5.1 Examples

12.6 Configuring Lustre with Bonding

12.6.1 Bonding References

13. Upgrading Lustre

13.1 Lustre Interoperability

13.2 Upgrading from Lustre 1.4.12 to Latest 1.6.x Version

13.2.1 Prerequisites to Upgrading Lustre

13.2.2 Supported Upgrade Paths

13.2.3 Starting Clients

13.2.4 Upgrading a Single File system

13.2.5 Upgrading Multiple File Systems with a Shared MGS

13.3 Upgrading Lustre 1.6.x to the Next Minor Version

13.4 Downgrading from Latest 1.6.x Version to Lustre 1.4.12

13.4.1 Downgrade Requirements

13.4.2 Downgrading a File System

14. Lustre SNMP Module

14.1 Installing the Lustre SNMP Module

14.2 Building the Lustre SNMP Module

14.3 Using the Lustre SNMP Module

15. Backup and Restore

15.1 Lustre Backups

15.1.1 Filesystem-level Backups

15.1.2 Device-level Backups

15.1.3 Performing File-level Backups

15.2 Restoring from a File-level Backup

15.3 LVM Snapshots on Lustre Target Disks

15.3.1 Creating LVM-based Lustre Filesystem As a Backup

15.3.2 Backing Up New Files to the Backup Filesystem

15.3.3 Creating LVM Snapshot Volumes

15.3.4 Restoring From Old Snapshot

15.3.5 Delete Old Snapshots

16. POSIX

16.1 Installing POSIX

16.2 Running POSIX Tests Against Lustre

16.3 Isolating and Debugging Failures

17. Benchmarking

17.1 Bonnie++ Benchmark

17.2 IOR Benchmark

17.3 IOzone Benchmark

18. Lustre I/O Kit

18.1 Lustre I/O Kit Description and Prerequisites

18.1.1 Downloading an I/O Kit

18.1.2 Prerequisites to Using an I/O Kit

18.2 Running I/O Kit Tests

18.2.1 sgpdd_survey

18.2.2 obdfilter_survey

18.2.3 ost_survey

18.3 PIOS Test Tool

18.3.1 Synopsis

18.3.2 PIOS I/O Modes

18.3.3 PIOS Parameters

18.3.4 PIOS Examples

18.4 LNET Self-Test

18.4.1 Basic Concepts of LNET Self-Test

18.4.2 LNET Self-Test Concepts

18.4.3 LNET Self-Test Commands

19. Lustre Recovery

19.1 Recovering Lustre

19.2 Types of Failure

19.2.1 Client Failure

19.2.2 MDS Failure (and Failover)

19.2.3 OST Failure

19.2.4 Network Partition

Part III Lustre Tuning, Monitoring and Troubleshooting

20. Lustre Tuning

20.1 Module Options

20.1.1 MDS Threads

20.2 LNET Tunables

20.3 Options to Format MDT and OST Filesystems

20.3.1 Planning for Inodes

20.3.2 Calculating MDT Size

20.3.3 Overriding Default Formatting Options

20.4 Network Tuning

20.5 DDN Tuning

20.5.1 Setting Readahead and MF

20.5.2 Setting Segment Size

20.5.3 Setting Write-Back Cache

20.5.4 Setting maxcmds

20.5.5 Further Tuning Tips

20.6 Large-Scale Tuning for Cray XT and Equivalents

20.6.1 Network Tunables

20.7 Lockless I/O Tunables

20.8 Data Checksums

21. Lustre Monitoring and Troubleshooting

21.1 Monitoring Lustre

21.2 Troubleshooting Lustre

21.2.1 Error Numbers

21.2.2 Error Messages

21.2.3 Lustre Logs

21.3 Submitting a Lustre Bug

21.4 Common Lustre Problems and Performance Tips

21.4.1 Recovering from an Unavailable OST

21.4.2 Write Performance Better Than Read Performance

21.4.3 OST Object is Missing or Damaged

21.4.4 OSTs Become Read-Only

21.4.5 Identifying a Missing OST

21.4.6 Changing Parameters

21.4.7 Viewing Parameters

21.4.8 Default Striping

21.4.9 Erasing a Filesystem

21.4.10 Reclaiming Reserved Disk Space

21.4.11 Considerations in Connecting a SAN with Lustre

21.4.12 Handling/Debugging "Bind: Address already in use" Error

21.4.13 Replacing An Existing OST or MDS

21.4.14 Handling/Debugging Error "- 28"

21.4.15 Triggering Watchdog for PID NNN

21.4.16 Handling Timeouts on Initial Lustre Setup

21.4.17 Handling/Debugging "LustreError: xxx went back in time"

21.4.18 Lustre Error: "Slow Start_Page_Write"

21.4.19 Drawbacks in Doing Multi-client O_APPEND Writes

21.4.20 Slowdown Occurs During Lustre Startup

21.4.21 Log Message ‘Out of Memory’ on OST

21.4.22 Number of OSTs Needed for Sustained Throughput

21.4.23 Setting SCSI I/O Sizes

22. LustreProc

22.1 /proc Entries for Lustre

22.1.1 Finding Lustre

22.1.2 Lustre Timeouts

22.1.3 Adaptive Timeouts in Lustre

22.1.4 LNET Information

22.1.5 Free Space Distribution

22.2 Lustre I/O Tunables

22.2.1 Client I/O RPC Stream Tunables

22.2.2 Watching the Client RPC Stream

22.2.3 Client Read-Write Offset Survey

22.2.4 Client Read-Write Extents Survey

22.2.5 Watching the OST Block I/O Stream

22.2.6 Using File Readahead and Directory Statahead

22.2.7 mballoc History

22.2.8 mballoc3 Tunables

22.2.9 Locking

22.3 Debug Support

22.3.1 RPC Information for Other OBD Devices

23. Lustre Debugging

23.1 Lustre Debug Messages

23.1.1 Format of Lustre Debug Messages

23.2 Tools for Lustre Debugging

23.2.1 Debug Daemon Option to lctl

23.2.2 Controlling the Kernel Debug Log

23.2.3 The lctl Tool

23.2.4 Finding Memory Leaks

23.2.5 Printing to /var/log/messages

23.2.6 Tracing Lock Traffic

23.2.7 Sample lctl Run

23.2.8 Adding Debugging to the Lustre Source Code

23.2.9 Debugging in UML

23.3 Troubleshooting with strace

23.4 Looking at Disk Content

23.4.1 Determine the Lustre UUID of an OST

23.4.2 Tcpdump

23.5 Ptlrpc Request History

23.6 Using LWT Tracing

Part IV Lustre for Users

24. Free Space and Quotas

24.1 Querying Filesystem Space

24.2 Using Quotas

25. Striping and I/O Options

25.1 File Striping

25.1.1 Advantages of Striping

25.1.2 Disadvantages of Striping

25.1.3 Stripe Size

25.2 Displaying Files and Directories with lfs getstripe

25.3 lfs setstripe - Setting File Layouts

25.3.1 Changing Striping for a Subdirectory

25.3.2 Using a Specific Striping Pattern/File Layout for a Single File

25.3.3 Creating a File on a Specific OST

25.4 Free Space Management

25.4.1 Round-Robin Allocator

25.4.2 Weighted Allocator

25.4.3 Adjusting the Weighting Between Free Space and Location

25.5 Performing Direct I/O

25.5.1 Making Filesystem Objects Immutable

25.6 Other I/O Options

25.6.1 End-to-End Client Checksums

25.7 Striping Using llapi

26. Lustre Security

26.1 Using ACLs

26.1.1 How ACLs Work

26.1.2 Using ACLs with Lustre

26.1.3 Examples

26.2 Using Root Squash

26.2.1 Configuring Root Squash

26.2.2 Enabling and Tuning Root Squash

26.2.3 Tips on Using Root Squash

27. Lustre Operating Tips

27.1 Adding an OST to a Lustre Filesystem

27.2 A Simple Data Migration Script

27.3 Adding Multiple SCSI LUNs on Single HBA

27.4 Failures Running a Client and OST on the Same Machine

27.5 Improving Lustre Metadata Performance While Using Large Directories

Part V Reference

28. User Utilities (man1)

28.1 lfs

28.2 lfsck

28.3 Filefrag

28.4 Mount

28.5 Handling Timeouts

29. Lustre Programming Interfaces (man2)

29.1 User/Group Cache Upcall

29.1.1 Name

29.1.2 Description

29.1.3 Parameters

29.1.4 Data structures

30. Setting Lustre Properties (man3)

30.1 Using llapi

30.1.1 llapi_file_create

30.1.2 llapi_file_get_stripe

30.1.3 llapi_file_open

30.1.4 llapi_quotactl

30.1.5 llapi_path2fid

31. Configuration Files and Module Parameters (man5)

31.1 Introduction

31.2 Module Options

31.2.1 LNET Options

31.2.2 SOCKLND Kernel TCP/IP LND

31.2.3 QSW LND

31.2.4 RapidArray LND

31.2.5 VIB LND

31.2.6 OpenIB LND

31.2.7 Portals LND (Linux)

31.2.8 Portals LND (Catamount)

31.2.9 MX LND

32. System Configuration Utilities (man8)

32.1 mkfs.lustre

32.2 tunefs.lustre

32.3 lctl

32.4 mount.lustre

32.5 New Utilities in Lustre 1.6

32.5.1 lustre_rmmod.sh

32.5.2 e2scan

32.5.3 Utilities to Manage Large Clusters

32.5.4 Application Profiling Utilities

32.5.5 More /proc Statistics for Application Profiling

32.5.6 Testing / Debugging Utilities

32.5.7 Flock Feature

32.5.8 l_getgroups

32.5.9 llobdstat

32.5.10 llstat

32.5.11 lst

32.5.12 plot-llstat

32.5.13 routerstat

32.5.14 ll_recover_lost_found_objs

33. System Limits

33.1 Maximum Stripe Count

33.2 Maximum Stripe Size

33.3 Minimum Stripe Size

33.4 Maximum Number of OSTs and MDTs

33.5 Maximum Number of Clients

33.6 Maximum Size of a Filesystem

33.7 Maximum File Size

33.8 Maximum Number of Files or Subdirectories in a Single Directory

33.9 MDS Space Consumption

33.10 Maximum Length of a Filename and Pathname

33.11 Maximum Number of Open Files for Lustre Filesystems

33.12 OSS RAM Size for a Single OST

A. Version Log

B. Lustre Knowledge Base

Glossary

Index