Saturday, June 8, 2013

Configuring Hugepages For Oracle on Linux

Huge Pages (also known as Large Pages) is something we at House of Brick are often asked about. There is significant information on the subject, however, the information is scattered throughout MetaLink and various other resources. I have attempted to consolidate some of this information, and have added some of our experiences and best practices for implementing Huge Pages.

Linux Memory Kernel Parameters

Before implementing Huge Pages, a check of kernel memory settings should be performed. Two key settings for successfully implementation are shmall and shmax.
Oracle makes use of one of the three memory management models to create the SGA during database startup, and it does this in the following sequence: First, Oracle attempts to use the one-segment model (most optimal). If this fails, it proceeds with the next one, which is the contiguous multi-segment model (contiguous smaller chunks). If that fails as well, it goes with the last option, which is the non-contiguous multi-segment model. This model can potentially cause memory fragmentation.
Reference: http://docs.oracle.com/cd/B12037_01/server.101/b10755/initparams166.htm
Oracle memory kernel parameters adjusted:
kernel.shmall = 1073741824
kernel.shmax = 4398046511104

Operating System Huge Pages

Using Huge Pages is a technique that allows the operating system to allocate and manage memory using a larger-than-standard page size. This term is synonymous with “Large Memory Pages.” The default page size for most systems is 4KB (4096 bytes). Huge Pages are typically 2MB or 4MB. Managing memory using a larger page size reduces the required number system calls by a factor of 500 or 1,000.
Due to dramatic improvements in hardware and memory performance, the smaller page size has not traditionally been a significant problem. However, as memory capacity has the potential to exceed 1TB, and databases can effectively manage hundreds of GBs of memory, it becomes significant. Huge Pages were introduced to address this issue. The performance impact associated with the system calls relating to memory management is particularly noticeable when looking at very large memory allocations under a hypervisor, because memory operations must be protected. Protected system calls require a degree of serialization and can experience a measurable performance penalty under a hypervisor. Therefore reducing the number of calls has a high return.
A 14% performance improvement has been shown under load for an Oracle database with a 24GB SGA (shared memory pool) when using huge pages as compared with standard pages. A 40% improvement was measured for a system with a 96GB SGA with no other changes.

Under The Hood of Huge Pages

The CPU's Memory Management Unit (MMU) stores a cache of recently used memory address. This is called Translation Lookaside Buffer (TLB). TLB is a memory address table stored in the near CPU cache. Without TLB, a modern day OS does a "Page Walk" walking the physical memory to find the physical memory address, this process causes higher latency. With TLB the OS can check the TLB cache for the memory address for the required memory location, if successful this is called a "page hit". A TLB miss is called "page fault", which results in a Page Walk. When your application uses Huge Pages, a larger memory range is addressed, allowing for more page hits. This reduces the time it takes to find data in memory and the lowers the CPU cost.
Linux handles huge pages gracefully by reserving a pool of memory at system boot time. This amount of memory can be increased or decreased on the fly using sysctl, but an increase requires that the memory be available to be successful without a reboot. Using Huge Pages in Linux implies memory locking. User limits must accommodate locking of the entire memory segment for Huge Pages to be successful.

Database 11g Huge Pages

Enabling expanded Automatic Memory Management (AMM) in Database 11g prohibits use of Linux huge pages. This is true of all current Database 11g R1 and R2 releases and patches, including the latest (Database 11.2.0.2). Database 11g R1 expanded AMM to include the Program Global Area (PGA). AMM is not active by default in Database 11g; it is enabled via the parameters MEMORY_TARGET and MEMORY_MAX_TARGET. However, the Database 10g SGA_TARGET and SGA_MAX_SIZE parameters continue to work as they did previously.
AMM does not use standard System V-style IPC shared memory. Instead, AMM uses memory mapped files in /dev/shm. The drawback is that memory mapped files do not support Huge Pages. This is a fundamental problem with AMM if Huge Pages are desired.
Of the two features, it is strongly recommended that Huge Pages take priority, regardless of the workload characteristics and database instance memory size. In general, Huge Pages is a performance feature, whereas AMM is a configuration feature.
Database 11.2.0.2 introduced the initialization parameter USE_LARGE_PAGES to provide more Huge Pages-related instance control. Its values are TRUE, FALSE, ONLY , AUTO (new to 11.2.0.3). The default is TRUE and causes Oracle to behave as it always has. Set the parameter to ONLY to enforce the use of Huge Pages and prevent instance startup if sufficient huge pages are not available. Set the parameter to AUTO to allow to use both huge pages and kernel pages simultaneously.
Finally, although all versions of Oracle since version 7 have supported Huge Pages, Database 11.2.0.2 is the first release that provides any meaningful output in the alert log relating to Oracle’s use of them.
Metalink References:
HugePages on Oracle Linux 64-bit [ID 361468.1]
USE_LARGE_PAGES To Enable HugePages In 11.2 [ID 1392497.1]
HugePages on Linux: What It Is... and What It Is Not... [ID 361323.1]
Shell Script to Calculate Values Recommended Linux HugePages / HugeTLB Configuration [ID 401749.1]
HugePages and Oracle Database 11g Automatic Memory Management (AMM) on Linux [ID 749851.1]

Huge Pages Implementation Example

Note: As stated previously, Huge Pages are not compatible with Oracle AMM.
alter system set sga_max_size = 44000M scope=spfile;
alter system set sga_target = 44000M scope=spfile;
alter system set pga_aggregate_target = 4000M scope=spfile;
alter system set pre_page_sga = TRUE scope=spfile;
Check to see if hugepages are currently configured:
grep Huge /proc/meminfo
If hugepages_total is > zero, then this is the number of Huge Pages currently configured. Huge Pages are allocated in 2MB chunks. Next, calculate the number of Huge Pages needed for a 44GB SGA, and add ten pages for overhead.
(44000/2) + 10 = 22010
The setting for hugepages is configured in /etc/sysctl.conf. To check the current setting, run the following command:
grep nr_hugepages /etc/sysctl.conf
Check /etc/security/limits.conf to make sure that Oracle can lock shared memory. This number is expressed in KB, set this to a number higher than 44000000 (44 GB expressed in KB)
grep oracle /etc/security/limits.conf
oracle    soft    memlock 50000000
oracle    hard    memlock    50000000

Note: This section must be done as root.

Add or modify the following line to /etc/sysctl.conf:
vm.nr_hugepages=22010
Add or modify the following two lines to /etc/security/limits.conf:
oracle    soft    memlock 50000000
oracle    hard    memlock    50000000
Add or modify the following two lines in /etc/sysctl.conf:
kernel.shmmax = 4398046511104
kernel.shmall = 1073741824
kernel.shmmni = 4096
The system must be rebooted to have the hugepages setting take effect. After the reboot check the Huge Pages setting.
# grep Huge /proc/meminfo
HugePages_Total:  22010
HugePages_Free:   22010
HugePages_Rsvd:      0
Hugepagesize:     2048 kB
If the settings are correct then start the database. Confirm Huge Pages are allocated with the command below.
grep Huge /proc/meminfo
HugePages_Total:  22010
HugePages_Free:     496
HugePages_Rsvd:     486
Hugepagesize:     2048 kB

No comments:

Post a Comment