Request / Error / Symptom :
CPU 정보 조회 시 출력 값 이상
- ps 명령으로 프로세스 정보 조회 시 utime과 stime이 크게 확인됨 (리부팅 후에도 동일)
- 실제 CPU의 Working은 없으나, top 명령으로 조회 시 CPU 사용률이 높게 확인됨
Analysis :
Time Stamp Counter (TSC)가 정상적으로 초기화되지 않아 나타나는 증상이다.
이는 RHEL의 커널의 버그(Bug)로, 조치를 위해서 레드햇 KB에서 언급한 버전으로 커널 업그레이드가 필요하다.
System Information
Product Name: ProLiant DL380p Gen8
BIOS Information
Version: P70
Release Date: 09/18/2013
Processor Information
Version: Intel(R) Xeon(R) CPU E5-2643 v2 @ 3.50GHz
Description: Red Hat Enterprise Linux Server release 6.4 (Santiago)
Linux TEST 2.6.32-358.el6.x86_64 #1 SMP Tue Jan 29 11:47:41 EST 2013 x86_64 x86_64 x86_64 GNU/Linux
Systems with Intel® Xeon® Processor E5, Intel® Xeon® Processor E5 v2, or Intel® Xeon® Processor E7 v2 and certain versions of Red Hat Enterprise Linux 6 kernels become unresponsive/hung or incur a kernel panic
https://access.redhat.com/solutions/433883
Root Cause
On Intel® Xeon® Processor E5 Family 6 Model 45 (also known as SandyBridge), the Time Stamp Counter (TSC) is not cleared by a warm reset. This is documented in the Intel® Xeon® Processor E5 Family Specification Update as erratum BT81.
On Intel® Xeon® Processor E5 v2 Family 6 Model 62 (also known as IvyBridge), the Time Stamp Counter (TSC) is not cleared by a warm reset. This is documented in the Intel® Xeon® Processor E5 v2 Family Specification Update as erratum CA105.
On Intel® Xeon® Processor E7 v2 Family 6 Model 62 (also known as IvyBridge-EX), the Time Stamp Counter (TSC) is not cleared by a warm reset. This is documented in the Intel® Xeon® E7-2800/4800/8800 v2 Product Family Specification Update as erratum CF101.
Resolution
This issue is addressed in the following kernel updates:
• RHEL 6.5 - kernel-2.6.32-431.el6.
This package is available via Errata RHSA-2013:1645. The related Red Hat Private Bug is 975507.
• RHEL 6.4.z EUS - kernel-2.6.32-358.23.2.el6.
This package is available via Errata RHSA-2013:1436. The related Red Hat Private Bug is 1001954.
• RHEL 6.3.z EUS - kernel-2.6.32-279.37.2.el6.
This package is available via Errata RHSA-2013:1450. The related Red Hat Private Bug is 1004185.
• RHEL 6.2.z EUS - kernel-2.6.32-220.45.1.el6.
This package is available via Errata RHSA-2013:1519. The related Red Hat Private Bug is 1024453.
The workaround will be for the customer to do a COLD reboot, not a warm reboot to clear the TSC timer.
Action Plan 1.
What: upgrade Kernel
Why : 리부팅 후에도 CPU 사용 관련 비정상 조회되는 것을 해소하기 위해
c.f.) Workaround Action – 커널 업그레이드가 불가능한 상황이라면, 리부팅 필요 시 Cold Boot로 진행