본문 바로가기
OS-OE Knowledge/Linux-Unix KB

RHEL/BUG: soft lockup - CPU#N stuck for 67s!

by 스쳐가는인연 2014. 8. 27.

Request / Error / Symptom :

Command 통한 리부팅 시 아래 메시지를 반복 출력하며 부팅 진행되지 않음, 전원버튼 이용하여 Cold boot 시 정상 부팅 진행 됨

BUG: soft lockup - CPU#0 stuck for 67s! [migration/0:5]

 

Analysis :

RHEL OS의 알려진 버그로, 커널을 업그레이드 하거나, audit을 중단함으로서 회피/해소 가능하다.

 

System Information

Product Name: ProLiant BL460c Gen8

 

BIOS Information

Version: I31

Release Date: 09/08/2013

 

OS, Kernel version

Linux 2.6.32-279.el6.x86_64

 

OS Specific Release Information (/etc/redhat-release)

Red Hat Enterprise Linux Server release 6.3 (Santiago)

 

RHEL 6 server with audit enabled hangs on startup/shutdown (BUG: soft lockup - CPU#N stuck for 67s!) in audit code

https://access.redhat.com/solutions/502603

 

Environment

Red Hat Enterprise Linux (RHEL) 6, several minor releases < 5

 

Resolution

RHEL6.2.z(EUS): Update the kernel to 2.6.32-220.45.1.el6 (released with RHSA-2013-1519) or later to fix the issue.

RHEL6.3.z(EUS): Update the kernel to 2.6.32-279.39.1.el6 (released with RHSA-2013-1783) or later to fix the issue.

RHEL6.4.z(EUS): Update the kernel to 2.6.32-358.28.1.el6 (released with RHBA-2013-1770) or later to fix the issue.

RHEL6.5: Update the kernel to 2.6.32-431.el6 (released with RHSA-2013-1645) or later to fix the issue. This fix is already included in RHEL6.5.

 

A workaround was tested by one of our customers and appeared to be working - just start the system with "audit" disabled.

Add "audit=0" to the kernel command line (in /boot/grub/grub.conf) and try to boot. This workaround disables auditd daemon.

 

The audit subsystem can become heavily loaded for brief instances even on a fairly idle system.

We recommend that any auditd users move to a patched kernel, since we cannot quantify the amount of load that may trigger the hang.

 

 

Recommendation :

Action Plan 1.

What: Upgrade Kernel or Disable Audit

Why : OS 버그로 인한 정상적인 리부팅 실패 방지를 위해

1. Update the kernel to 2.6.32-279.39.1.el6 (released with RHSA-2013-1783) or later

or

2. Add "audit=0" to the kernel command line (in /boot/grub/grub.conf) and try to boot

 

반응형