RESTful Interface Tool - Linux
2.2(5 Feb 2018)
https://support.hpe.com/hpsc/swd/public/detail?swItemId=MTX_57783eaa1a7847f3a165a0e5a6


------------------------------------------------------------
> /usr/sbin/ilorest ilorest login iLO_IP_Address -u Admin_Account -p Password --selector=Chassis.
> /usr/sbin/ilorest list Status |grep Health

------------------------------------------------------------------
iLO WebUI - OK
LED - Green
        Health=OK
------------------------------------------------------------------
iLO WebUI - Degraded
LED – Flashing Amber (some like lost power redundancy)
        Health=Warning
------------------------------------------------------------------
iLO WebUI - Critial
LED - RED
        Health=Critical
------------------------------------------------------------------

> /usr/sbin/ilorest --nocache login <% iLO_IP_Addresss %> -u Administrator -p Password --selector=Chassis.
-> Delete Cache

cf.) default cache location:
Win - C:\Users\USERNAME\AppData\Roaming\.ilorest\cache\*
LX   - $HOME/.iLOrest/cache/*

RESTful Interface Tool - Windows
2.2.0.0(5 Feb 2018)
https://support.hpe.com/hpsc/swd/public/detail?swItemId=MTX_4a357d45353244d8b8d56fdae2

 

save below as gethealt.bat
----------------------------------------------------------------

:: gethealth.bat [URI] [USERNAME] [PASSWORD]
@echo off

set argC=0
for %%x in (%*) do Set /A argC+=1
if %argC% LSS 3 goto :failCondition
goto :main

:failCondition
@echo Usage:
@echo gethealth.bat [URI] [USERNAME] [PASSWORD]
goto :EOF

:main
@echo *****************************************
@echo ************* Logging in... *************
@echo *****************************************
ilorest.exe login %1 -u %2 -p %3
@echo *****************************************
@echo ******* selecting Chassis type.. ********
@echo *****************************************
ilorest.exe select Chassis.
@echo *****************************************
@echo ********* getting Health State **********
@echo *****************************************
ilorest.exe list Status
ilorest.exe logout

----------------------------------------------------------------

Posted by 스쳐가는인연

< Table: IPMI 2.0 Generic Event/Reading Type Code 07 >

 

e.g.) 모든 상황을 확인해 볼 수 없음이 좀 아쉽 아쉽 ...

Green   0x01, bit 00

Amber   0x02, bit 01

RED      0x08, bit 03

 

Green (No Issue)

# ipmitool -I lanplus -H <% iLO_IP_Addresss %> -U Administrator -P Password raw 4 0x2d 0xac

00 c0 01 80

# ipmitool -I lanplus -H <% iLO_IP_Addresss %> -U Administrator -P Password sdr elist | head -n 2

UID              | AEh | ok  | 23.1 |

SysHealth_Stat   | ACh | ok  | 23.1 | Transition to OK

 

Degraded (1 power source removed when rdnt PSU configured)

# ipmitool -I lanplus -H <% iLO_IP_Addresss %> -U Administrator -P Password sdr elist | head -n 2

UID              | AEh | ok  | 23.1 |

SysHealth_Stat   | ACh | ok  | 23.1 | Transition to Non-critical from OK

# ipmitool -I lanplus -H <% iLO_IP_Addresss %> -U Administrator -P Password raw 4 0x2d 0xac

00 c0 02 80

 

Degraded (SSB failue)

# ipmitool -I lanplus -H <% iLO_IP_Addresss %> -U Administrator -P Password raw 4 0x2d 0xac

00 c0 02 80

# ipmitool -I lanplus -H <% iLO_IP_Addresss %> -U Administrator -P Password sdr elist | head -n 2

UID              | AEh | ok  | 23.1 |

SysHealth_Stat   | ACh | ok  | 23.1 | Transition to Non-critical from OK

 

Critical (lost Logical Drive)

# ipmitool -I lanplus -H <% iLO_IP_Addresss %> -U Administrator -P Password raw 4 0x2d 0xac

00 c0 08 80

# ipmitool -I lanplus -H <% iLO_IP_Addresss %> -U Administrator -P Password sdr elist | head -n 2

UID              | AEh | ok  | 23.1 |

SysHealth_Stat   | ACh | ok  | 23.1 | Transition to Non-recoverable from less severe

 

Critical (System Power fault)

# ipmitool -I lanplus -H 10.254.53.157 -U Administrator -P Password raw 4 0x2d 0xac

00 e0 01 80

# ipmitool -I lanplus -H 10.254.53.157 -U Administrator -P Password sdr elist | head -n 2

UID              | AEh | ns  | 23.1 | No Reading

SysHealth_Stat   | ACh | ns  | 23.1 | No Reading

 

if check RESTful API 

# /usr/sbin/ilorest login <% iLO_IP_Addresss %> -u Administrator -p Password --selector=Chassis.

iLOrest : RESTful Interface Tool version 2.2

Copyright (c) 2014, 2017 Hewlett Packard Enterprise Development LP

----------------------------------------------------------------------------------------------------------------------------------

Discovering data...Done

WARNING: Cache is activated. Session keys are stored in plaintext.

# /usr/sbin/ilorest list Status |grep Health

        Health=Critical

 

# /usr/sbin/ilorest --nocache login <% iLO_IP_Addresss %> -u Administrator -p Password --selector=Chassis.

-> Delete Cache

 

cf.) default cache location:
Win - C:\Users\USERNAME\AppData\Roaming\.ilorest\cache\*
LX   - $HOME/.iLOrest/cache/*

 

Gen10의 경우 ipmitool sdr elist로 유효 결과를 얻을 수 있음 (굳이 raw를 쓸 필요 없음~)

그러나, 심각한 장애 상황인 경우, IPMI로는 시스템 정보를 제대로 확인할 수 없음

 

Health LED 상태의 경우, RESTful 결과를 참조하는 것이 좋다는 결론 (IPMI 보안 이슈도 있고 ...)

 

참고. http://infoages.com/1869?category=319414

Posted by 스쳐가는인연

HPE ProLiant System과 함께, Broadcom (ex. Emulex) CNA(NIC) 사용 중, Unrecoverable Error 발생

 

원인 및 증상

Broadcom(. Emulex) CNA를 사용하는 환경에서, 아래 중 하나 이상의 오류를 경험할 수 있음

 

Linux 계열의 경우, OS event log - "/var/log/messages",

아래와 유사 패턴의 로그가 기록됨 ---------------

be2net 0000:06:00.4: Unrecoverable Error detected in the adapter

be2net 0000:06:00.4: Please reboot server to recover

이 후, 아래 중 특정 footprint가 함께 기록됨

 

be2net 0000:06:00.4: UE LOW: TPOST bit set

or

 

be2net 0000:06:00.4: UE LOW: MPU bit set

be2net 0000:06:00.4: UE LOW: TPOST bit set

or

 

be2net 0000:06:00.4: eth4: Link down

be2net 0000:06:00.4: UE HIGH: TXPB bit set

-----------------------------------------------------

 

VMware의 경우 아래의 형태로 기록됨 ----------

2017-02-24T20:45:34.067Z cpuX:33579)WARNING: elxnet: elxnet_detectDumpUe:344: [vmnicX] UE Detected!!

2017-02-24T20:45:34.067Z cpuX:33579)WARNING: elxnet: elxnet_detectDumpUe:352: [vmnicX] UE lo: MPU bit set

2017-02-24T20:45:34.067Z cpuX:33579)WARNING: elxnet: elxnet_detectDumpUe:361: [vmnicX] UE hi: TXP bit set

 

2017-02-24T20:45:34.747Z cpuX:33584)WARNING: lpfc: lpfc_sli4_eratt_read:11015: 0:1423 HBA Unrecoverable error: uerr_lo_reg=0x4000020, uerr_hi_reg=0x1000, ue_mask_lo_reg=0x4000000, ue_mask_hi_reg=0x80000000

2017-02-24T20:45:34.747Z cpuX:33584)WARNING: lpfc: lpfc_sli4_eratt_read:11015: 1:1423 HBA Unrecoverable error: uerr_lo_reg=0x4000020, uerr_hi_reg=0x1000, ue_mask_lo_reg=0x4000000, ue_mask_hi_reg=0x80000000

 

2017-02-24T20:45:34.747Z cpuX:33585)WARNING: lpfc: lpfc_handle_eratt_s4:1993: 0:7623 Checking UE recoverable

2017-02-24T20:45:34.747Z cpuX:33588)WARNING: lpfc: lpfc_handle_eratt_s4:1993: 1:7623 Checking UE recoverable

-----------------------------------------------------

 

Windows의 포트 비활성화 경우:

 

 

Unrecoverable Error

- Hard reset(e.g. reboot) 없이는 하드웨어가 복구/복원 되지 않는 상황에 진입된 하드웨어적 오류 이벤트

- software(fw/driver) or hardware의 여러 요인에 의해 발생될 수 있는, 표면적/결과적 오류 이벤트로,

각 상황 별로 발생 요인이 다를 수 있음.

(그로 인해, 특정 조치 방법을 공통 적용 시, 결과가 다르게 나타날 수 있음)

 

환경

- HPE ProLiant G7 – Gen9

- 아래 중 하나 이상의 CNA 부품을 사용하는 경우,

HPE Ethernet 10Gb 2-port 557SFP+ Adapter

HPE FlexFabric 10Gb 2-port 556FLR-SFP+ Adapter

HPE FlexFabric 10Gb 2-port 556FLR-T Adapter

HPE FlexFabric 20Gb 2-port 650FLB Adapter

HPE FlexFabric 20Gb 2-port 650M Adapter

HPE StoreFabric CN1200E 10Gb Converged Network Adapter

HPE StoreFabric CN1200E 10GBASE-T Dual Port Converged Network Adapter

HPE NC553m 10Gb 2-port FlexFabric Adapter

HPE FlexFabric 10Gb 2-port 554M Adapter

HPE FlexFabric 10Gb 2-port 554FLB Adapter

HPE FlexFabric 10Gb 2-port 554FLR-SFP+ Adapter

HPE NC552m 10Gb 2-port FlexFabric Converged Network Adapter

HPE NC552SFP 10Gb 2-port Ethernet Server Adapter

HPE NC553i 10Gb 2-port FlexFabric Converged Network Adapter

HPE CN1100E Dual Port Converged Network Adapter

 

솔루션

Unrecoverable Error가 발생되는 일부 알려진 증상의 경우 상위 fw로 업그레이드하여 해소 가능함

Driver에 예외 처리 알고리즘(가능한 경우 복구 시도)이 추가됨

 

Action Item 1.

What: fw and driver(OS 버전 확인 필요) upgrade to latest

When: 가능한 때,

- fw Pacakage 11.1.183.48 (or later) 및 그에 호환되는 driver 확인 후 적용 필요

- 사용 중인 시스템에 대하여, 지원 가능한 최신 SPP 적용 권장

 

Action Item 2.

What: System Reboot

When: 적용된 장치 fw“11.1.183.48” 이상이고, UE 이벤트를 경험한 경우

 

참조 Advisory ------------------------------------------

Advisory: HP Emulex Adapters - Network Adapters May Become Unrecoverable or Disabled Due to an Unexpected Error Caused by a FAT File Mismatch

https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04943543

 

be2net Unrecoverable Error detected in the adapter

https://access.redhat.com/solutions/1229853

 

Why did the server hang with messages like "kernel: be2net 0000:04:00.0: UE: MPU bit set" on the console?

https://access.redhat.com/solutions/401023

-----------------------------------------------------------

 

Posted by 스쳐가는인연

BL c7000 OA Configuration - Backup
upload config ftp://<% ftpServerIPAddress %>/<% ftp_root_subpath %>/MyOAConfig.txt

upload config ftp://Administrator:password@192.168.200.1/oa/MyOAConfig.txt

 

BL c7000 OA Configuration - Restore
download config ftp://192.168.200.1/oa/myoaconfig.txt
download config ftp://<FTP_username>:<FTP_password>@<FTP server IP address>/<Configuration_file.txt>

 

Posted by 스쳐가는인연

RHEL 6.7을 Basic Server package로 설치 후 콘솔 화면에 아래와 유사/동일 메시지를 지속 기록하는 현상이 확인됨

----------------------------------------------------
dmar: DRHD: handling fault status reg 2
dmar: INTR-REMAP: Request device [[00:00.0] fault index 14
INTR-REMAP:[fault reason 38] Blocked an interrupt request due to source-id verification failure
----------------------------------------------------

 

[00:00.0] 장치는 Intel의 Skylake host bus controller이고,
메시지는, IOMMU(input–output memory management unit)의 interrupt redirector로 부터 발생된다.

 

VM등의 가상 머신에서 IO 장치를 보다 원활히 사용하기 위해, IO 가상화 과정에,
물리적인 장치의 메모리 영역과 가상화된 장치의 메모리를 mapping하는 것이 interrupt remapping(IR)으로 이해할 수 있다.

 

이 remapping이 문제가 있다는 알림 메시지이나, 일부 OS와 HW의 Interrupt remapping(IR)이 제대로 연동되지 않아 발생될 수 있다.

 

BIOS에서 IOMMU-Interrupt remapping 기능을 H/W에서 비활성화 하거나, OS에서 intremap(Interrupt remapping 기능을 비활성화하여, 해소/회피할 수 있다.

 

 

Test Environment
DL20 Gen9 + RHEL 6.7

 

-------------------------------------------------------------------
- OS install with Basic Server packages
- Default

# dmesg
dmar: DRHD: handling fault status reg 2
dmar: INTR-REMAP: Request device [[00:00.0] fault index 14
INTR-REMAP:[fault reason 38] Blocked an interrupt request due to source-id verification failure
-------------------------------------------------------------------

 

 

-------------------------------------------------------------------
- Disable Virtualization Technology from BIOS
>> Symptom persist presents
-------------------------------------------------------------------
- Disable Intel VT-d from BIOS
>> Symptom disappeared
-------------------------------------------------------------------
- Disable Shared Memory from each NIC on BIOS
>> Symptom persist presents
-------------------------------------------------------------------
- Add kernel parameter to GRUB tail (1)
-------------------------------------------
# vim /boot/grub/grub.conf  (Legacy)
# vim /boot/efi/EFI/redhat/grub.conf (UEFI)
...
        kernel /vmlinuz-2.6.32-573.el6.x86_64 ... rhgb quiet
-------------------------------------------
- Add kernel parameter to GRUB tail (2)
-------------------------------------------
# vim /etc/grub.conf
...
        kernel /vmlinuz-2.6.32-573.el6.x86_64 ... rhgb quiet
-------------------------------------------
a) intel_iommu=off
b) intremap=no_x2apic_optout
>> Symptom persist presents

c) intremap=off
>> Symptom disappeared (1 or 2)
-------------------------------------------------------------------

 

 

참고문서.

Installed CentOS6.7 to DL20Gen9. However, strange log is output.
https://community.hpe.com/t5/ProLiant-Servers-ML-DL-SL/Installed-CentOS6-7-to-DL20Gen9-However-strange-log-is-output/td-p/6860807

 

Advisory: (Revision) Linux - DMAR Fault on Network Adapter With HP NC-Series Broadcom 1GbE Multifunction Driver for Linux Driver (bnx2x) When Linux "intel_iommu=on" Kernel Boot Parameter Is Used on HPE Servers
https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04565693

 

Advisory: HPE Moonshot - Error Messages Indicating Interrupt Failures in dmesg May Be Displayed After Successful Deployment of Red Hat Enterprise Linux 6 on a ProLiant m710x Server Cartridge
https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-a00019143en_us

 

RHEL6: "dmar: DRHD: handling fault status reg 2" messages are printed by the kdump kernel
https://access.redhat.com/solutions/1480103

Posted by 스쳐가는인연