Windows server 2008에서 BSOD 0x101이 발생하는 경우 접근하는 방법
웹상에서 확인해보니 BSOD 0x101을 WinDbg를 이용해서 분석하는 방법은 여러가지 방법으로 오류에 접근하고 있었으나, 아래 방법이 쉽고 빠르게 접근할 수 있어 보이더라.(내 생각인거 ~_~;;)
하지만, 결과를 단정할 수는 없겠다.
Bug Check 0x101: CLOCK_WATCHDOG_TIMEOUT
http://msdn.microsoft.com/en-us/library/ff557211(v=vs.85).aspx
Cause
The specified processor is not processing interrupts. Typically, this occurs when the processor is nonresponsive or is deadlocked.
These actions might prevent an error like this from happening again:
1.Download and install updates and device drivers for your computer from Windows Update.
2.Scan your computer for computer viruses.
3.Check your hard disk for errors.
HW의 Firmware 및 Driver 그리고 OS의 Patch 상태가 최신인지 점검이 되어야 한다.
WinDbg 툴을 이용해서 분석하는 법
간단한 BSOD가 아닌 경우 Small Dump(mini dump)에서는 신뢰할 수 있는 정보를 얻을 수 없는 경우가 많다.
올바른 분석을 위해서는 Kernel Dump 이상을 권장하는 것 같다.
0: kd> !analyze –v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
CLOCK_WATCHDOG_TIMEOUT (101)
An expected clock interrupt was not received on a secondary processor in an
MP system within the allocated interval. This indicates that the specified
processor is hung and not processing interrupts.
Arguments:
Arg1: 0000000000000004, Clock interrupt time out interval in nominal clock ticks.
Arg2: 0000000000000000, 0.
Arg3: fffff880030d6180, The PRCB address of the hung processor.
Arg4: 000000000000002b, 0.
경험에 따르면, Arg3의 PRCB가 Arg4의 논리프로세서에서 Hung에 걸린 경우가 많다.
Debugging Details:
------------------
BUGCHECK_STR: CLOCK_WATCHDOG_TIMEOUT_40_PROC
DEFAULT_BUCKET_ID: WIN7_DRIVER_FAULT
PROCESS_NAME: System
CURRENT_IRQL: d
STACK_TEXT:
fffff800`02be5a28 fffff800`016dfa89 : 00000000`00000101 00000000`00000004 00000000`00000000 fffff880`030d6180 : nt!KeBugCheckEx
fffff800`02be5a30 fffff800`01692eb7 : 00000000`00000000 fffff800`0000002b 00000000`00026161 00000000`00000000 : nt! ?? ::FNODOBFM::`string'+0x4e2e
fffff800`02be5ac0 fffff800`01bfc895 : fffff800`01c22460 fffff800`02be5c70 fffff800`01c22460 fffff880`00000000 : nt!KeUpdateSystemTime+0x377
fffff800`02be5bc0 fffff800`01684b73 : fffff800`00000000 00000000`ffffffff 00000000`00000000 00000000`00000000 : hal!HalpHpetClockInterrupt+0x8d
fffff800`02be5bf0 fffff800`01680342 : fffff800`017fae80 fffff800`00000001 00000000`00000001 fffff880`00000000 : nt!KiInterruptDispatchNoLock+0x163
fffff800`02be5d80 00000000`00000000 : fffff800`02be6000 fffff800`02be0000 fffff800`02be5d40 00000000`00000000 : nt!KiIdleLoop+0x32
STACK_COMMAND: kb
SYMBOL_NAME: ANALYSIS_INCONCLUSIVE
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: Unknown_Module
IMAGE_NAME: Unknown_Image
DEBUG_FLR_IMAGE_TIMESTAMP: 0
FAILURE_BUCKET_ID: X64_CLOCK_WATCHDOG_TIMEOUT_40_PROC_ANALYSIS_INCONCLUSIVE
BUCKET_ID: X64_CLOCK_WATCHDOG_TIMEOUT_40_PROC_ANALYSIS_INCONCLUSIVE
Followup: MachineOwner
---------
0: kd> .bugcheck
Bugcheck code 00000101
Arguments 00000000`00000004 00000000`00000000 fffff880`030d6180 00000000`0000002b
0: kd> vertarget
Windows 7 Kernel Version 7601 (Service Pack 1) MP (64 procs) Free x64
Product: Server, suite: Enterprise TerminalServer SingleUserTS
Built by: 7601.17514.amd64fre.win7sp1_rtm.101119-1850
Machine Name:
Kernel base = 0xfffff800`01608000 PsLoadedModuleList = 0xfffff800`0184de90
Debug session time: Sun Mar 10 20:06:52.797 2013 (UTC + 9:00)
System Uptime: 3 days 7:38:30.811
Server OS에서 외 Client로 인식하는지 쩝 _ _;;
0: kd> !running
System Processors: (ffffffffffffffff)
Idle Processors: (ffffffffffffffff) (0000000000000000) (0000000000000000) (0000000000000000)
All processors idle.
0: kd> !prcb 2b
PRCB for Processor 43 at fffff880030d6180:
Current IRQL -- 0
Threads-- Current fffff880030e1ec0 Next 0000000000000000 Idle fffff880030e1ec0
Processor Index 43 Number (0, 43) GroupSetMember 80000000000
Interrupt Count -- 01398caf
Times -- Dpc 00000001 Interrupt 00000001
Kernel 01186c39 User 00000000
위에서 언급한대로, 논리 프로세서 2B(43)에서 PRCB(프로세서 컨트롤 블럭)가 확인된다.
0: kd> !numa
NUMA Summary:
------------
Number of NUMA nodes : 4
Number of Processors : 64
MmAvailablePages : 0x00F19C26
KeActiveProcessors :
**************************************************************** (ffffffffffffffff)
NODE 0 (FFFFF80001808C00):
Group : 65535 (Assigned, Committed, Assignment Adjustable)
ProcessorMask : (ffff)
ProximityId : 0
Capacity : 16
Seed : 0x00000004
Color : 0x00000000
MmShiftedColor : 0x00000000
Right : 0x00000000
Left : 0x0000000F
Zeroed Page Count: 0x000000000039CA06
Free Page Count : 0x0000000000000000
NODE 1 (FFFFF880024BE380):
Group : 0 (Assigned, Committed, Assignment Adjustable)
ProcessorMask : (ffff0000)
ProximityId : 1
Capacity : 16
Seed : 0x00000010
Color : 0x00000001
MmShiftedColor : 0x00000100
Right : 0x00000010
Left : 0x0000001F
Zeroed Page Count: 0x00000000003CA5D3
Free Page Count : 0x0000000000000000
NODE 2 (FFFFF88002BEA380):
Group : 0 (Assigned, Committed, Assignment Adjustable)
ProcessorMask : (ffff00000000)
ProximityId : 2
Capacity : 16
Seed : 0x00000020
Color : 0x00000002
MmShiftedColor : 0x00000200
Right : 0x00000020
Left : 0x0000002F
Zeroed Page Count: 0x00000000003D2B16
Free Page Count : 0x0000000000000005
NODE 3 (FFFFF88003322380):
Group : 0 (Assigned, Committed, Assignment Adjustable)
ProcessorMask : (ffff000000000000)
ProximityId : 3
Capacity : 16
Seed : 0x00000032
Color : 0x00000003
MmShiftedColor : 0x00000300
Right : 0x00000030
Left : 0x0000003F
Zeroed Page Count: 0x00000000003CE082
Free Page Count : 0x0000000000000000
서버의 경우 매우 많은 프로세서를 사용하는 경우가 많기 때문에,
NUMA 정보를 확인하여, 현재 프로세서 그룹 정보를 확인한다.
논리 프로세서 43이 물리 프로세서 3번에 당하며,
특정 코어와 쓰레드에서 문제를 보였음을 확인이 가능해진다.
Logical |
Physical |
Core |
thread |
0 |
1 |
0 |
0 |
1 |
1 |
0 |
1 |
2 |
1 |
1 |
0 |
3 |
1 |
1 |
1 |
4 |
1 |
2 |
0 |
5 |
1 |
2 |
1 |
6 |
1 |
3 |
0 |
7 |
1 |
3 |
1 |
8 |
1 |
4 |
0 |
9 |
1 |
4 |
1 |
10 |
1 |
5 |
0 |
11 |
1 |
5 |
1 |
12 |
1 |
6 |
0 |
13 |
1 |
6 |
1 |
14 |
1 |
7 |
0 |
15 |
1 |
7 |
1 |
16 |
2 |
0 |
0 |
17 |
2 |
0 |
1 |
18 |
2 |
1 |
0 |
19 |
2 |
1 |
1 |
20 |
2 |
2 |
0 |
21 |
2 |
2 |
1 |
22 |
2 |
3 |
0 |
23 |
2 |
3 |
1 |
24 |
2 |
4 |
0 |
25 |
2 |
4 |
1 |
26 |
2 |
5 |
0 |
27 |
2 |
5 |
1 |
28 |
2 |
6 |
0 |
29 |
2 |
6 |
1 |
30 |
2 |
7 |
0 |
31 |
2 |
7 |
1 |
32 |
3 |
0 |
0 |
33 |
3 |
0 |
1 |
34 |
3 |
1 |
0 |
35 |
3 |
1 |
1 |
36 |
3 |
2 |
0 |
37 |
3 |
2 |
1 |
38 |
3 |
3 |
0 |
39 |
3 |
3 |
1 |
40 |
3 |
4 |
0 |
41 |
3 |
4 |
1 |
42 |
3 |
5 |
0 |
43 |
3 |
5 |
1 |
44 |
3 |
6 |
0 |
45 |
3 |
6 |
1 |
46 |
3 |
7 |
0 |
47 |
3 |
7 |
1 |
48 |
4 |
0 |
0 |
49 |
4 |
0 |
1 |
50 |
4 |
1 |
0 |
51 |
4 |
1 |
1 |
52 |
4 |
2 |
0 |
53 |
4 |
2 |
1 |
54 |
4 |
3 |
0 |
55 |
4 |
3 |
1 |
56 |
4 |
4 |
0 |
57 |
4 |
4 |
1 |
58 |
4 |
5 |
0 |
59 |
4 |
5 |
1 |
60 |
4 |
6 |
0 |
61 |
4 |
6 |
1 |
62 |
4 |
7 |
0 |
63 |
4 |
7 |
1 |