Contact Us
Technical Guide
Your current position:Home > Technical Guide
【365比分网】SUSE Memory Overflow Troubleshooting



Troubleshooting


Customer a Linux system reboot frequently, engineers received a repair call after logging in to view, 

found a large number of oom in the log, memory overflow alarms, each time after reboot, 

do not start any application, the host memory is quickly occupied; at the same time, 

the rest of the clients use sftp to transfer data, often due to memory overflow caused by the failure 

of the transmission, affecting the business.



Troubleshooting


2.1 Messages file contents

Dec 24 15:49:55 xxxxxxxxxxxx kernel: [   52.671197] BIOS EDD facility v0.16 2004-Jun-25, 2 devices found

Dec 24 15:49:56 xxxxxxxxxxxx kernel: [   53.763632] oom_kill_process: 21 callbacks suppressed


2.2  System resource utilization

image001.jpg


I communicated with the customer and learned that PotralAgent is a regular process, and other machines also have this process, 

and there is no abnormality in resource usage.

It can be seen that the host machine is configured with 8G of RAM, which is very little left, 

and at the same time some of the swap is also used,indicating that memory resources are tight.


2.3 Memory Usage Analysis

Comparison of memory information before and after reboot

before the reboot

after reboot


MemTotal: 8062712 kB

MemFree: 590024 kB

Buffers: 2584 kB

Cached: 30332 kB

SwapCached: 832 kB

MemTotal: 8062712 kB

MemFree: 566876 kB

Buffers: 968 kB

Cached: 16080 kB

SwapCached: 552 kB

AnonHugePages:  0 kB

AnonHugePages: 0 kB


Slab:                35932 kB

SReclaimable:         4900 kB

SUnreclaim:          31032 kB

KernelStack:          1704 kB

PageTables:           3916 kB


Slab:                35308 kB

SReclaimable:         3736 kB

SUnreclaim:          31572 kB

KernelStack:          1304 kB

PageTables:           2432 kB


HugePages_Total:      3559

HugePages_Free:       3559

HugePages_Rsvd:          0

HugePages_Surp:          0

Hugepagesize:         2048 kB

HugePages_Total:      3580

HugePages_Free:       3580

HugePages_Rsvd:          0

HugePages_Surp:          0

Hugepagesize:         2048 kB


CommitLimit:      10872696 kB

Committed_AS:   197048 kB

CommitLimit:      10851192 kB

Committed_AS:       542772 kB


DirectMap4k:        57344 kB

DirectMap2M:       8331264 kB

DirectMap4k:         57344 kB

DirectMap2M:       8331264 kB

Through the above analysis, the memory over-subscription problem is not serious and THP is not turned on.

The cause of the failure has been obvious: 2M page cache has as much as 8GB (8331264/1024/1024), plus other memory overhead, 

has clearly exceeded the sum of the system's physical memory.


2.4 Check the system configuration

cat /proc/sys/vm/nr_hugepages

3559

cat /etc/sysctl.conf|grep nr_hugepages

vm.nr_hugepages = 5120

You can see that the system is configured with 5,120 pages, but in reality, 3,559 pages are allocated in the actual operation.(3559*2048/2=7GB)。


Troubleshooting


By turning down the nr_hugepages value or canceling it, it will not cause the system memory to be full, 

and the problem is solved.



Lessons Learned


4.1 Introduction to HugePages


As the size of computing requirements continues to increase, so does the demand for memory by applications. 

In order to realize the virtual memory management mechanism, the operating system implements memory paging. 

Since the inception of the memory "paging mechanism", the default size of memory pages has been set to 4096 bytes (4KB). 

Although in principle the memory page size is configurable, the majority of operating system implementations still use the default 4KB pages. 

The 4KB page size was reasonable when the "paging mechanism" was introduced because memory was only a few tens of megabytes at that time, 

but when physical memory grew to several gigabytes or even dozens of gigabytes, is it still reasonable for operating systems to still use 

the 4KB page size as the basic unit?


When running memory-hungry applications on the Linux operating system, the default page size of 4KB generates a lot of TLB misses and out-of-page interrupts, 

which greatly affects the performance of the application. 

When the operating system uses 2MB or more as the paging unit, the number of TLB misses and page miss interrupts will be greatly reduced, 

which will significantly improve the performance of the application. This is the direct reason why the Linux kernel introduced large page support. 

The benefits are obvious. 

Assuming that an application requires 2MB of memory, if the OS uses 4KB as the paging unit, then it requires 512 pages, 

which in turn requires 512 table entries in the TLB, 

and 512 page table entries, 

and the OS needs to go through at least 512 TLB Misses and 512 Page Misses in order to map all of the 2MB of application space to physical memory; 

however, when the OS uses 2KB as the paging unit, 

then it requires 512 page entries in the TLB, and 512 page misses in the TLB. However, when the OS adopts 2MB as the basic unit of paging, 

only 1 TLB Miss and 1 page miss interrupt 

are needed to create a real-virtual mapping for 2MB of application space, 

and no more TLB Misses and page miss interrupts are needed during operation (assuming no TLB entry replacement and swap).


In order to achieve large page support at minimal cost, the Linux operating system uses a hugetlbfs-based special file system for 2M-byte large page support. 

This special file system approach to large page support allows applications the flexibility to choose the virtual page size as needed without being forced to use 2MB pages.


4.2 THP


Transparent Huge Pages, abbreviated as THP, Transparent Huge Pages (THP) is enabled by default for all applications in RHEL 6. 

The kernel tries to allocate huge pages whenever possible, 

and the main kernel address space itself is mapped as a huge page, reducing the TLB pressure on the kernel code. 

The kernel will always try to use giant pages to fulfill memory allocations. 

If no huge pages are available (e.g. due to physical contiguous memory being unavailable), the kernel will fall back to normal 4KB pages.

THP is also swappable (unlike hugetlbfs). 

This is accomplished by splitting large pages into smaller 4KB pages, which are then swapped out normally.


4.3 THP Considerations


A static huge page has a separate memory system from the normal memory consisting of 4KB normal pages, 

and a static huge page also does not support swap operations andcannot be swapped out to external storage media.

THP and static huge page look similar, but their attributes and behaviors in Linux are completely different. 

It can be understood in this way, 

the latter is molded in one piece, 

while the former is like welding together. 

THP is still more similar to the normal page in nature.

THP is more like a normal page in nature. 


When you need to swap, when you swap out, you break it up into 4KB and swap it to disk space; 

when you swap in, you may need to re-aggregate it again, 

which has a significant impact on performance.

Oracle does not officially recommend turning on Transparent Huge Pages (THP) when using 

RedHat 6/OEL 6/SLES 11 / UEK2 kernels because there are some issues with Transparent Huge Pages:

In a RAC environment, Transparent Huge Pages (THP) can cause abnormal node restarts and performance issues;

In a standalone environment.



Knowledge Expansion

5.1 Open HugePages

Edit sysctl.conf

      vi /etc/sysctl.conf

      vm.nr_hugepages = xxxx 

Edit limits.conf

      vi /etc/security/limits.conf

      * soft memlock -1

       * hard memlock -1

go into effect

     sysctl -p

validate (a theory)

     grep -i hugepages /proc/meminfo

5.2 Oracle's Use of HugePages


Prior to 11.2.0.2, the SGA of a database could only choose to use all or none of the hugepages;

11.2.0.2 and later, oracle adds a new parameter "USE_LARGE_PAGES" to manage how the database uses hugepages;

The USE_LARGE_PAGES parameter has three values: "true" (default), "only", "false" and "auto" (since 11.2.0.3 patchset):

The default value is "true", if the system sets Hugepages, SGA will prioritize the use of hugepages and use as many as possible;


11.2.0.2 If there are not enough hugepages, the SGA will not use them. This will result in an ORA-4030 error because hugepages 

have been allocated from physical memory, but instead of using it, the SGA uses some other part of memory, 

resulting in insufficient memory resources;

However, in version 11.2.0.3, this usage policy has been changed so that the SGA can use some of the hugepages and the rest of 

the small pages, so that the SGA will use a limited number of hugepages, 

and then use the regular sized pages after the hugepages have been used up.

If set to "false", the SGA will not use hugepages;

If set to "only", the database instance cannot be started if the size of the hugepages is insufficient (to prevent memory overflow);

After version 11.2.0.3, it can be set to "auto", an option that triggers the oradism process to reconfigure the linux kernel to increase 

the number of hugepages. oradism needs to be given the appropriate permissions, as follows:

-rwsr-x--- 1 root It will not bother to change the hugepages value in the /etc/sysctl.conf file, when the OS reboots, 

the system will revert to the hugepages value configured in /etc/sysctl.conf again.

For Oracle-only servers, setting the Hugepage to the SGA (sum of all instance SGAs) size is sufficient;

If you increase the HugePage or add physical memory or if new instances are added to the current server and the SGA changes, 

you should reset the required HugePage.


5.3 Turning off THP

To see if THP is turned on


  • [always] madvise never

    [root@coding ~]# cat /sys/kernel/mm/transparent_hugepage/enabled

    [always] madvise never

    [root@coding ~]# cat /sys/kernel/mm/transparent_hugepage/defrag

  • Close THP


Edit /etc/rc.local and add the following:

echo "never" >/sys/kernel/mm/transparent_hugepage/enabled

echo "never" >/sys/kernel/mm/transparent_hugepage/defrag

5.4 HugePages Setup Script

Executing this script will give suggested values.

To execute this script, you need to install the bc rpm package.

Run it with oracle user and make sure that the instance starts properly.

#! /bin/bash

#

# hugepages_settings.sh

#

# Linux bash script to compute values for the

# recommended HugePages/HugeTLB configuration

# on Oracle Linux

#

# Note: This script does calculation for all shared memory

# segments available when the script is run, no matter it

# is an Oracle RDBMS shared memory segment or not.

#

# This script is provided by Doc ID 401749.1 from My Oracle Support

# http://support.oracle.com

 

if [ $(rpm -qa|grep ^bc) ]

then

echo "Already install bc rpm package,contine."

else

echo "Please install bc rpm package within os medium."

exit

fi

 

# Welcome text

echo "

This script is provided by Doc ID 401749.1 from My Oracle Support

(http://support.oracle.com) where it is intended to compute values for

the recommended HugePages/HugeTLB configuration for the current shared

memory segments on Oracle Linux. Before proceeding with the execution please note following:

 * For ASM instance, it needs to configure ASMM instead of AMM.

 * The 'pga_aggregate_target' is outside the SGA and

   you should accommodate this while calculating SGA size.

 * In case you changes the DB SGA size,

   as the new SGA will not fit in the previous HugePages configuration,

   it had better disable the whole HugePages,

   start the DB with new SGA size and run the script again.

And make sure that:

 * Oracle Database instance(s) are up and running

 * Oracle Database 11g Automatic Memory Management (AMM) is not setup

   (See Doc ID 749851.1)

 * The shared memory segments can be listed by command:

     # ipcs -m

 

Press Enter to proceed..."

 

read

 

# Check for the kernel version

KERN=`uname -r | awk -F. '{ printf("%d.%d\n",$1,$2); }'`

 

# Find out the HugePage size

HPG_SZ=`grep Hugepagesize /proc/meminfo | awk '{print $2}'`

if [ -z "$HPG_SZ" ];then

    echo "The hugepages may not be supported in the system where the script is being executed."

    exit 1

fi

 

# Initialize the counter

NUM_PG=0

 

# Cumulative number of pages required to handle the running shared memory segments

for SEG_BYTES in `ipcs -m | cut -c44-300 | awk '{print $1}' | grep "[0-9][0-9]*"`

do

    MIN_PG=`echo "$SEG_BYTES/($HPG_SZ*1024)" | bc -q`

    if [ $MIN_PG -gt 0 ]; then

        NUM_PG=`echo "$NUM_PG+$MIN_PG+1" | bc -q`

    fi

done

 

RES_BYTES=`echo "$NUM_PG * $HPG_SZ * 1024" | bc -q`

 

# An SGA less than 100MB does not make sense

# Bail out if that is the case

if [ $RES_BYTES -lt 100000000 ]; then

    echo "***********"

    echo "** ERROR **"

    echo "***********"

    echo "Sorry! There are not enough total of shared memory segments allocated for

HugePages configuration. HugePages can only be used for shared memory segments

that you can list by command:

 

    # ipcs -m

 

of a size that can match an Oracle Database SGA. Please make sure that:

 * Oracle Database instance is up and running

 * Oracle Database 11g Automatic Memory Management (AMM) is not configured"

    exit 1

fi

 

# Finish with results

case $KERN in

    '2.2') echo "Kernel version $KERN is not supported. Exiting." ;;

    '2.4') HUGETLB_POOL=`echo "$NUM_PG*$HPG_SZ/1024" | bc -q`;

           echo "Recommended setting: vm.hugetlb_pool = $HUGETLB_POOL" ;;

    '2.6') echo "Recommended setting: vm.nr_hugepages = $NUM_PG" ;;

    '3.8') echo "Recommended setting: vm.nr_hugepages = $NUM_PG" ;;

    '3.10') echo "Recommended setting: vm.nr_hugepages = $NUM_PG" ;;

    '4.1') echo "Recommended setting: vm.nr_hugepages = $NUM_PG" ;;

esac

 

# End

 


For more information, please visit Antute's official website:3.durayork.com

版权所有 365比分网 Filing No:京ICP备17074963号-1
Technical Support:Genesis Network