Lustre 1.8 Operations Manual
821-0035-11
|
Contents |
1.1 Introducing the Lustre File System
1.2.1 Lustre Networking (LNET)
1.4 Files in the Lustre File System
1.4.1 Lustre File System and Striping
1.7 Lustre Failover and Rolling Upgrades
2. Understanding Lustre Networking
2.3 Designing Your Lustre Network
2.3.1 Identify All Lustre Networks
2.3.2 Identify Nodes to Route Between Networks
2.3.3 Identify Network Interfaces to Include/Exclude from LNET
2.3.4 Determine Cluster-wide Module Configuration
2.3.5 Determine Appropriate Mount Parameters for Clients
2.4.1.2 OFED InfiniBand Options
2.4.2 Module Parameters - Routing
2.5 Starting and Stopping LNET
3.1 Preparing to Install Lustre
3.1.1 Supported Operating System, Platform and Interconnect
3.1.2 Required Lustre Software
3.1.3 Required Tools and Utilities
3.1.4 (Optional) High-Availability Software
3.1.6 Environmental Requirements
3.1.7.1 MDS Memory Requirements
3.1.7.2 OSS Memory Requirements
3.2 Installing Lustre from RPMs
3.3 Installing Lustre from Source Code
3.3.1.1 Introducing the Quilt Utility
3.3.1.2 Get the Lustre Source and Unpatched Kernel
3.3.2 Create and Install the Lustre Packages
3.3.3 Installing Lustre with a Third-Party Network Stack
4.1 Configuring the Lustre File System
4.1.0.1 Simple Lustre Configuration Example
4.1.1 Scaling the Lustre File System
4.2 Additional Lustre Configuration
4.3 Basic Lustre Administration
4.3.1 Specifying the File System Name
4.3.5 Working with Inactive OSTs
4.3.6 Finding Nodes in the Lustre File System
4.3.7 Mounting a Server Without Lustre Service
4.3.8 Specifying Failout/Failover Mode for OSTs
4.3.9 Running Multiple Lustre File Systems
4.3.10 Setting and Retrieving Lustre Parameters
4.3.10.1 Setting Parameters with mkfs.lustre
4.3.10.2 Setting Parameters with tunefs.lustre
4.3.10.3 Setting Parameters with lctl
4.3.10.4 Reporting Current Parameter Values
4.3.11 Regenerating the Lustre Configuration Logs
4.3.13 Removing and Restoring OSTs
4.3.13.1 Removing an OST from the File System
4.3.13.2 Restoring an OST in the File System
4.3.15 Determining Which Machine is Serving an OST
4.4 More Complex Configurations
4.5.1 Unmounting a Server (without Failover)
4.5.2 Unmounting a Server (with Failover)
4.5.3 Changing the Address of a Failover Node
5.1 Introduction to Service Tags
5.2.2 Discovering and Registering Lustre Components
5.2.3 Information Registered with Sun
6. Configuring Lustre - Examples
6.1.1 Lustre with Combined MGS/MDT
6.1.1.2 Configuration Generation and Application
6.1.2 Lustre with Separate MGS and MDT
6.1.2.2 Configuration Generation and Application
6.1.2.3 Configuring Lustre with a CSV File
7. More Complicated Configurations
7.3 Load Balancing with InfiniBand
7.3.1 Setting Up modprobe.conf for Load Balancing
7.4 Multi-Rail Configurations with LNET
8.1.2 Types of Failover Configurations
8.2 Failover Functionality in Lustre
8.2.1 MDT Failover Configuration (Active/Passive)
8.2.2 OST Failover Configuration (Active/Active)
8.3 Configuring and Using Heartbeat with Lustre Failover
8.3.1 Creating a Failover Environment
8.3.1.1 Power Management Software
8.3.2 Setting up the Heartbeat Software
8.3.2.3 (Optional) Migrating a Heartbeat Configuration (v1 to v2)
8.3.3.2 Switching Resources Between Nodes
9.1.1.1 Administrative and Operational Quotas
9.1.2 Creating Quota Files and Quota Administration
9.1.4 Known Issues with Quotas
9.1.4.1 Granted Cache and Quota Limits
9.1.5.1 Interpreting Quota Statistics
10.1 Considerations for Backend Storage
10.1.1 Selecting Storage for the MDS or OSTs
10.1.2 Reliability Best Practices
10.1.3 Understanding Double Failures with Hardware and Software RAID5
10.1.5 Formatting Options for RAID Devices
10.1.5.1 Creating an External Journal
10.1.6 Handling Degraded RAID Arrays
10.2 Insights into Disk Performance Measurement
10.3 Lustre Software RAID Support
10.3.0.1 Enabling Software RAID on Lustre
11.2 Lustre Setup with Kerberos
11.2.1 Configuring Kerberos for Lustre
11.2.1.1 Kerberos Distributions Supported on Lustre
11.2.1.2 Preparing to Set Up Lustre with Kerberos
11.2.1.3 Configuring Lustre for Kerberos
11.2.1.5 Setting the Environment
11.2.2 Types of Lustre-Kerberos Flavors
11.2.2.4 Specifying Security Flavors
11.2.2.6 Rules, Syntax and Examples
11.2.2.7 Authenticating Normal Users
12.3 Using Lustre with Multiple NICs versus Bonding NICs
12.4 Bonding Module Parameters
12.6 Configuring Lustre with Bonding
13. Upgrading and Downgrading Lustre
13.3 Upgrading Lustre 1.6.x to 1.8.x
13.3.1 Performing a Complete File System Upgrade
13.3.2 Performing a Rolling Upgrade
13.4 Upgrading Lustre 1.8.x to the Next Minor Version
13.5 Downgrading from Lustre 1.8.x to 1.6.x
13.5.1 Performing a Complete File System Downgrade
13.5.2 Performing a Rolling Downgrade
14.1 Installing the Lustre SNMP Module
14.2 Building the Lustre SNMP Module
14.3 Using the Lustre SNMP Module
15.2 Backing up a Device (MDS or OST)
15.3.1 Backing up Extended Attributes
15.4 Restoring from a File-level Backup
15.5 Using LVM Snapshots with Lustre
15.5.1 Creating an LVM-based Backup File System
15.5.2 Backing up New/Changed Files to the Backup File System
15.5.3 Creating Snapshot Volumes
15.5.4 Restoring the File System From a Snapshot
15.5.6 Changing Snapshot Volume Size
16.2.1 POSIX Installation Using a Quick Start Version
16.3 Building and Running a POSIX Compliance Test Suite on Lustre
16.3.1 Building the Test Suite from Scratch
16.3.2 Running the Test Suite Against Lustre
16.4 Isolating and Debugging Failures
18.1 Lustre I/O Kit Description and Prerequisites
18.1.2 Prerequisites to Using an I/O Kit
18.2.2.1 Running obdfilter_survey Against a Local Disk
18.2.2.2 Running obdfilter_survey Against a Network
18.2.2.3 Running obdfilter_survey Against a Network Disk
18.4.1 Basic Concepts of LNET Self-Test
18.4.2 LNET Self-Test Commands
19.2.7 Gaps in the Replay Sequence
19.3.2 Reconstruction of Open Replies
19.5 Recovering from Errors or Corruption on a Backing File System
19.6 Recovering from Corruption in the Lustre File System
19.6.1 Working with Orphaned Objects
Part III Lustre Tuning, Monitoring and Troubleshooting
20.1.1 OSS Service Thread Count
20.1.1.1 Optimizing the Number of Service Threads
20.1.2 MDS Service Thread Count
20.2.0.1 Transmit and receive buffer size:
20.3 Options for Formatting the MDT and OSTs
20.4 Overriding Default Formatting Options
20.4.1 Number of Inodes for the MDT
20.4.3 Number of Inodes for an OST
20.5 Large-Scale Tuning for Cray XT and Equivalents
21.1.1 Locating Lustre File Systems and Servers
21.1.3.1 Configuring Adaptive Timeouts
21.1.3.2 Interpreting Adaptive Timeouts Information
21.1.5 Free Space Distribution
21.1.5.1 Managing Stripe Allocation
21.2.1 Client I/O RPC Stream Tunables
21.2.2 Watching the Client RPC Stream
21.2.3 Client Read-Write Offset Survey
21.2.4 Client Read-Write Extents Survey
21.2.5 Watching the OST Block I/O Stream
21.2.6 Using File Readahead and Directory Statahead
21.2.6.1 Tuning File Readahead
21.2.6.2 Tuning Directory Statahead
21.2.11 Setting MDS and OSS Thread Counts
21.3.1 RPC Information for Other OBD Devices
21.3.1.1 Interpreting OST Statistics
21.3.1.3 Interpreting MDT Statistics
22. Lustre Monitoring and Troubleshooting
22.4 Common Lustre Problems and Performance Tips
22.4.1 Recovering from an Unavailable OST
22.4.2 Write Performance Better Than Read Performance
22.4.3 OST Object is Missing or Damaged
22.4.5 Identifying a Missing OST
22.4.6 Improving Lustre Performance When Working with Small Files
22.4.9 Reclaiming Reserved Disk Space
22.4.10 Considerations in Connecting a SAN with Lustre
22.4.11 Handling/Debugging "Bind: Address already in use" Error
22.4.12 Replacing An Existing OST or MDS
22.4.13 Handling/Debugging Error "- 28"
22.4.14 Triggering Watchdog for PID NNN
22.4.15 Handling Timeouts on Initial Lustre Setup
22.4.16 Handling/Debugging "LustreError: xxx went back in time"
22.4.17 Lustre Error: "Slow Start_Page_Write"
22.4.18 Drawbacks in Doing Multi-client O_APPEND Writes
22.4.19 Slowdown Occurs During Lustre Startup
22.4.20 Log Message ‘Out of Memory’ on OST
22.4.21 Number of OSTs Needed for Sustained Throughput
22.4.22 Setting SCSI I/O Sizes
22.4.23 Identifying Which Lustre File an OST Object Belongs To
23.1.1 Format of Lustre Debug Messages
23.2 Tools for Lustre Debugging
23.2.1 Debug Daemon Option to lctl
23.2.1.1 lctl Debug Daemon Commands
23.2.2 Controlling the Kernel Debug Log
23.2.5 Printing to /var/log/messages
23.2.8 Adding Debugging to the Lustre Source Code
23.3 Troubleshooting with strace
23.4.1 Determine the Lustre UUID of an OST
24.1.2 Disadvantages of Striping
24.2 Displaying Files and Directories with lfs getstripe
24.3 lfs setstripe - Setting File Layouts
24.3.1 Changing Striping for a Subdirectory
24.3.2 Using a Specific Striping Pattern/File Layout for a Single File
24.3.3 Creating a File on a Specific OST
24.4.1 Checking File System Free Space
24.4.2 Using Stripe Allocations
24.4.5 Adjusting the Weighting Between Free Space and Location
24.5.1 Checking File System Usage
24.5.2 Taking a Full OST Offline
24.5.3 Migrating Data within a File System
24.6 Creating and Managing OST Pools
24.6.1.1 Using the lfs Command with OST Pools
24.6.2 Tips for Using OST Pools
24.7.1 Making File System Objects Immutable
24.8.1.1 Changing Checksum Algorithms
25.2.1 Configuring Root Squash
25.2.2 Enabling and Tuning Root Squash
26.1 Adding an OST to a Lustre File System
26.2 A Simple Data Migration Script
26.3 Adding Multiple SCSI LUNs on Single HBA
26.4 Failures Running a Client and OST on the Same Machine
26.5 Improving Lustre Metadata Performance While Using Large Directories
28. Lustre Programming Interfaces (man2)
28.1.2.1 Primary and Secondary Groups
29. Setting Lustre Properties (man3)
30. Configuration Files and Module Parameters (man5)
30.2.2 SOCKLND Kernel TCP/IP LND
30.2.8 Portals LND (Catamount)
31. System Configuration Utilities (man8)
31.5 Additional System Configuration Utilities
31.5.3 Utilities to Manage Large Clusters
31.5.4 Application Profiling Utilities
31.5.5 More /proc Statistics for Application Profiling
31.5.6 Testing / Debugging Utilities
31.5.14 ll_recover_lost_found_objs
32.4 Maximum Number of OSTs and MDTs
32.5 Maximum Number of Clients
32.6 Maximum Size of a File System
32.8 Maximum Number of Files or Subdirectories in a Single Directory
32.10 Maximum Length of a Filename and Pathname
32.11 Maximum Number of Open Files for Lustre File Systems
Copyright © 2010, Oracle and/or its affiliates. All rights reserved.