Cray XD 시스템의 BMC의 경우, HPE ProLiant의 iLO와 달리 지속적인 시스템 컴포넌트들의 온도를 관리/저장하지 않음.
이로 인해, 시스템의 온도 정보를 모니터링 하는데 있어 별도의 외부적 도움/구성이 필요함
Note. 모니터링 시스템 또는 모니터링 대상 시스템에서 구현 가능 예상
1) ipmitool을 통한 온도 정보 취합 스크립트 생성 (BMC 접속 경로 및 계정 정보 수정 필요)
# vim /var/tmp/collect_temperature.sh:
-----
#!/bin/bash
# This script continuously monitors IPMI sensor data and logs it to a single file.
# Recommend that Log rotation, compression, and deletion are handled by the logrotate utility.
# --- Configuration ---
# IMPORTANT: Replace these placeholders with your actual BMC/Credential details.
IPMI_USERNAME='BMC_Administrative_Username'
IPMI_PASSWORD='BMC_Password'
BMC_IP=BMC_IP_Address
# Define the log file name
# logrotate will handle the rotation of this file.
LOG_FILE="/var/tmp/temperature/${BMC_IP}-sensor_ipmi_data.log"
CACHE_FILE="/var/tmp/temperature/${BMC_IP}-temperature_cache.tmp"
# Start an infinite loop to continuously monitor sensor data.
while true
do
# Use ipmitool to get the sensor list.
# -I lanplus: Uses the LANPlus interface.
# -U MyUser -P MyPassword: Specifies the BMC Administrator permission username and password.
# -H <%BMC_IP_Address%>: Specifies the IP address of the BMC.
# IMPORTANT: Replace 'MyUser', 'MyPassword', and '<%BMC_IP_Address%>' with your actual values.
ipmitool -I lanplus -U $IPMI_USERNAME -P $IPMI_PASSWORD -H $BMC_IP sensor list \
| tr '|' , \
| xargs -I{} echo "$(date '+%F %T'),{}" \
| awk -F' *,? *' '{print $1" "$2","$3","$4}' > "$CACHE_FILE"
# Filter lines from temperature_cache.file that do not contain '0x' or 'na'.
# -v: Inverts the match, printing lines that DO NOT match.
# -e ',0x' -e ',na': Patterns to exclude (values starting with '0x' or 'na' values).
# >>: Appends the filtered data to the defined log file.
grep -v -e ',0x' -e ',na' "$CACHE_FILE" >> "$LOG_FILE"
# Pause script execution for 30 minutes.
sleep 30m
done
-----
2) 로그를 관리하기 위한 rotation rule 생성 (필요 시 수정)
# vim /etc/logrotate.d/sensor-data
-----
/var/tmp/temperature/*.log {
daily
rotate 5
compress
missingok
notifempty
create 0640 root root
dateext
dateyesterday
postrotate
endscript
olddir /var/tmp/temperature
}
-----
3) logrotate 동작 점검
a. check rotate status:
# /usr/sbin/logrotate /etc/logrotate.d/sensor-data -d
b. perform rotate by force:
# logrotate -f /etc/logrotate.conf
c. add cronjob for executes when boot and daily
# crontab -e
@reboot /var/tmp/collect_temperature.sh
0 0 * * * /usr/sbin/logrotate -f /etc/logrotate.d/sensor-data
Check cronjob status
# crontab -l
e.g)
# ll
total 224
-rw-r----- 1 root root 24076 Jul 3 01:55 BMC_IPAddress-sensor_ipmi_data.log
-rw-r--r-- 1 root root 164391 Jul 2 03:23 BMC_IPAddress-sensor_ipmi_data.log-20250701.gz
-rw-r----- 1 root root 23829 Jul 2 23:55 BMC_IPAddress-sensor_ipmi_data.log-20250702.gz
-rw-r--r-- 1 root root 6853 Jul 3 01:55 BMC_IPAddress-temperature_cache.tmp
-rwxr-xr-x 1 root root 1413 Jun 26 04:35 collect_temperature.sh