본문 바로가기
OS-OE Knowledge/Linux-Unix KB

RHEL/Short Durations of Throttling (TCC Activation)

by 스쳐가는인연 2018. 9. 21.

Linux 운영 중 아래와 유사한 이벤트가 발생될 수 있다.

 

--------------------------------------------------------------

Jun 20 10:43:56 localhost kernel: CPU57: Package temperature above threshold, cpu clock throttled (total events = 1)
Jun 20 10:43:56 localhost kernel: CPU57: Core temperature/speed normal
Jun 20 10:43:56 localhost kernel: CPU57: Package temperature/speed normal

--------------------------------------------------------------

 

- H/W 적으로는 전원 구성을 점검해 볼 필요가 있다.

  (Max Performance 권장. CPU가 Idle과 Powerup을 하는 경우 유발될 수 있음)
- S/W 적으로는 Kernel 패치가 필요한 지 검토할 필요가 있다. (Bug)

- Intel에 따르면, Intel Turbo Boost Technology에 따라, 단시간에 부하가 걸리는 경우(dynamic workload/micro bust 상황), Thermal Control Circuit (TCC)가 동작하여 발생할 수 있는 이벤트로, 운영 상황에 따라서는 무시할 수 있다.

  (CPU 사용율이 급증(?)하여 발열이 비정상적으로 발생될 때, Clock을 제어하여 온도를 제어하는 정상적인 동작)

> 참고로, 냉각을 강화하여 Thermal Margin(냉각 여유도)를 크게 유지하면, throttling 발생 빈도를 낮출 수 있다.

 

참고문서.

HPE ProLiant Gen8, HPE ProLiant Gen9, and HPE ProLiant Gen10 Servers - Short Durations of Throttling (TCC Activation) May Cause Operating Systems to Issue Machine Check Alerts, Which Is Expected Behavior
https://access.redhat.com/solutions/3401881

 

Notice: HPE ProLiant Gen8, HPE ProLiant Gen9, and HPE ProLiant Gen10 Servers - Short Durations of Throttling (TCC Activation) May Cause Operating Systems to Issue Machine Check Alerts, Which Is Expected Behavior
https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-a00020196en_us

 

This is not HPE-specific.
This behavior is expected and no action needs to be taken. Functionality of the system is not impacted by these alerts.


Seeing "Temperature above threshold" or "Core power limit notification" in /var/log/messages
https://access.redhat.com/solutions/134973

 

Resolution
•There are two different underlying issues that can trigger these messages.
◦One issue was a bug in the kernel.

 The issue was fixed in following kernel package versions (tracked via private RHBZ#908990)
◾RHEL6: kernel-2.6.32-407.el6(RHBZ#908990) or later
◾RHEL6.4.z: kernel-2.6.32-358.20.1.el6(RHBZ#999328) or later
◾RHEL6.3.z: kernel-2.6.32-279.39.1.el6(RHBZ#1020527) or later
◾RHEL6.2.z: kernel-2.6.32-220.44.1.el6(RHBZ#1020519) or later

◦The other is hardware side issue.These messages indicate that the system hardware is reporting temperatures above acceptable thresholds. These errors indicate a potential failure of the cooling solution on this system, and the CPUs are being throttled down to reduce the heat they generate. The system should be investigated for failing cooling and, should all be operational, hardware diagnostics should be run to ensure that the CPUs and system board are not faulty.


syslogd reporting: Temperature above threshold, cpu clock throttled.
https://access.redhat.com/solutions/35494

 

Resolution
Disabling the "C States" in the BIOS, so that the CPU is always running at full power.

반응형