본문 바로가기
HW Knowledge/Hewlett-Packard Enterprise

Cray XD670 - Monitoring Temperature

by 스쳐가는인연 2025. 7. 23.

Cray XD 시스템의 BMC의 경우, HPE ProLiantiLO와 달리 지속적인 시스템 컴포넌트들의 온도를 관리/저장하지 않음.

이로 인해, 시스템의 온도 정보를 모니터링 하는데 있어 별도의 외부적 도움/구성이 필요함

 

Note. 모니터링 시스템 또는 모니터링 대상 시스템에서 구현 가능 예상

 

1) ipmitool을 통한 온도 정보 취합 스크립트 생성 (BMC 접속 경로 및 계정 정보 수정 필요)

# vim /var/tmp/collect_temperature.sh:

-----

#!/bin/bash

 

# This script continuously monitors IPMI sensor data and logs it to a single file.

# Recommend that Log rotation, compression, and deletion are handled by the logrotate utility.

 

# --- Configuration ---

# IMPORTANT: Replace these placeholders with your actual BMC/Credential details.

IPMI_USERNAME='BMC_Administrative_Username'

IPMI_PASSWORD='BMC_Password'

BMC_IP=BMC_IP_Address

 

# Define the log file name

# logrotate will handle the rotation of this file.

LOG_FILE="/var/tmp/temperature/${BMC_IP}-sensor_ipmi_data.log"

CACHE_FILE="/var/tmp/temperature/${BMC_IP}-temperature_cache.tmp"

 

# Start an infinite loop to continuously monitor sensor data.

while true

do

    # Use ipmitool to get the sensor list.

    # -I lanplus: Uses the LANPlus interface.

    # -U MyUser -P MyPassword: Specifies the BMC Administrator permission username and password.

    # -H <%BMC_IP_Address%>: Specifies the IP address of the BMC.

    # IMPORTANT: Replace 'MyUser', 'MyPassword', and '<%BMC_IP_Address%>' with your actual values.

    ipmitool -I lanplus -U $IPMI_USERNAME -P $IPMI_PASSWORD -H $BMC_IP sensor list \

    | tr '|' , \

    | xargs -I{} echo "$(date '+%F %T'),{}" \

    | awk -F' *,? *' '{print $1" "$2","$3","$4}' > "$CACHE_FILE"

 

    # Filter lines from temperature_cache.file that do not contain '0x' or 'na'.

    # -v: Inverts the match, printing lines that DO NOT match.

    # -e ',0x' -e ',na': Patterns to exclude (values starting with '0x' or 'na' values).

    # >>: Appends the filtered data to the defined log file.

    grep -v -e ',0x' -e ',na' "$CACHE_FILE" >> "$LOG_FILE"

 

    # Pause script execution for 30 minutes.

    sleep 30m

done

-----

 

2) 로그를 관리하기 위한 rotation rule 생성 (필요 시 수정)

# vim /etc/logrotate.d/sensor-data

-----

/var/tmp/temperature/*.log {

    daily

    rotate 5

    compress

    missingok

    notifempty

    create 0640 root root

    dateext

    dateyesterday

    postrotate

    endscript

    olddir /var/tmp/temperature

}

-----

 

3) logrotate 동작 점검

a. check rotate status:

# /usr/sbin/logrotate /etc/logrotate.d/sensor-data -d

 

b. perform rotate by force:

# logrotate -f /etc/logrotate.conf

 

c. add cronjob for executes when boot and daily

# crontab -e

@reboot /var/tmp/collect_temperature.sh

0 0 * * * /usr/sbin/logrotate -f /etc/logrotate.d/sensor-data

 

Check cronjob status

# crontab -l

 

e.g)

# ll

total 224

-rw-r----- 1 root root  24076 Jul  3 01:55 BMC_IPAddress-sensor_ipmi_data.log

-rw-r--r-- 1 root root 164391 Jul  2 03:23 BMC_IPAddress-sensor_ipmi_data.log-20250701.gz

-rw-r----- 1 root root  23829 Jul  2 23:55 BMC_IPAddress-sensor_ipmi_data.log-20250702.gz

-rw-r--r-- 1 root root   6853 Jul  3 01:55 BMC_IPAddress-temperature_cache.tmp

-rwxr-xr-x 1 root root   1413 Jun 26 04:35 collect_temperature.sh

 

 

 

 

 

 

 

 

 

 

반응형