死机分析以前在R平台搞过,基本就是抓到死机时的CPU register等信息,然后用objdump反汇编出来结合源码定位分析,现在到了手机平台,多了个Tracer32,高通分析死机都在用,现在死机都挂我这了,老问高通也不是个事重拾下,我觉得可以不用trace32,基本还是那老一套。
先来看死机现场:
| [ 1256.852648] Unable to handle kernel NULL pointer dereference at virtual address 00000004 [ 1256.931061] pgd = e2110000
 [ 1256.933725] [00000004] *pgd=00000000
 [ 1256.937725] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
 [ 1256.942470] Modules linked in: wlan(O) [last unloaded: wlan]
 [ 1256.948098] CPU: 1 PID: 585 Comm: qti Tainted: G W O 3.18.71-perf-g24d2c84 #1
 [ 1256.956092] task: e41161c0 ti: e2148000 task.ti: e2148000
 [ 1256.961479] PC is at diagchar_read+0x610/0x11fc
 [ 1256.965978] LR is at 0x0
 [ 1256.968488] pc : [<c04e0b24>] lr : [<00000000>] psr: 60010013
 [ 1256.968488] sp : e2149ef0 ip : 00000051 fp : b1bb5b7c
 [ 1256.979944] r10: c6651000 r9 : 00000201 r8 : c5227bc0
 [ 1256.985152] r7 : c13dbc38 r6 : 00000014 r5 : 000186a0 r4 : b1bb5b78
 [ 1256.991676] r3 : 00000000 r2 : 80000000 r1 : 00000000 r0 : 00000000
 
 | 
内核空指针,一个关键信息是pc:c04e0b24。
objdump vmlinux出来后,基本-lD就够用了,搜到pc:
| /code/kernel/msm-3.18/drivers/char/diag/diagchar_core.c:2933c04e0b1c:	e3510000 	cmp	 r1, #0
 c04e0b20:	05983030 	ldreq 	 r3, [r8, #48]	; 0x30
 c04e0b24:	05933004 	ldreq	 r3, [r3, #4] ===============> 这里crash
 c04e0b28:	0a000028 	beq	 c04e0bd0 <diagchar_read+0x6bc>
 c04e0b2c:	ea0001b2 	b	 c04e11fc <diagchar_read+0xce8>
 /code/kernel/msm-3.18/drivers/char/diag/diagchar_core.c:2937
 
 | 
找到源码2933行:
| 	if (driver->data_ready[index] & EVENT_MASKS_TYPE) {
 data_type = driver->data_ready[index] & EVENT_MASKS_TYPE;
 session_info = diag_md_session_get_peripheral(APPS_DATA);
 COPY_USER_SPACE_OR_EXIT(buf, data_type, 4);
 2931		if (session_info && session_info->event_mask &&
 2932		    session_info->event_mask->ptr) {
 2933			COPY_USER_SPACE_OR_EXIT(buf + sizeof(int),
 *(session_info->event_mask->ptr),
 session_info->event_mask->mask_len);
 } else {
 COPY_USER_SPACE_OR_EXIT(buf + sizeof(int),
 *(event_mask.ptr),
 event_mask.mask_len);
 }
 driver->data_ready[index] ^= EVENT_MASKS_TYPE;
 goto exit;
 }
 
 | 
2933行是:COPY_USER_SPACE_OR_EXIT(buf + sizeof(int),,crash的地方再看下现场:
| r8 : c5227bc0r3 : 00000000
 
 | 
也就是说r3 = 0是触发这个死机的因, r3和r8看着应该和2934,2935有关,那到底是不是了,先看下偏移,struct里又是宏又是嵌套结构体,用gdb帮忙:
| $ arm-eabi-gdb vmlinux GNU gdb (GDB) 7.6
 Copyright (C) 2013 Free Software Foundation, Inc.
 
 (gdb) p &((struct diag_md_session_t*)0)->event_mask
 $1 = (struct diag_mask_info **) 0x30
 
 (gdb) p &((struct diag_mask_info*)0)->mask_len
 $2 = (int *) 0x4 <__vectors_start+4>
 
 | 
再往上看看就能得出:
| r8=session_infor3=r8+48=session_info->event_mask
 r3=r3+4=session_info->event_mask->mask_len
 
 | 
r3=0触发,也就是session_info->event_mask是0?2931已经判断过了:
| /code/kernel/msm-3.18/drivers/char/diag/diagchar_core.c:2931 (discriminator 1)c04e0a68:	e3580000 	cmp	r8, #0 ==================> r8 = sesstion_info
 /work/buildfarm/jenkins/workspace/buildfarml_rmnj_10/kernel/msm-3.18/drivers/char/diag/diagchar_core.c:2930 (discriminator 1)
 c04e0a6c:	e2833004 	add	r3, r3, #4
 c04e0a70:	e58d3024 	str	r3, [sp, #36]	; 0x24
 /code/kernel/msm-3.18/drivers/char/diag/diagchar_core.c:2931 (discriminator 1)
 c04e0a74:	0a00002d 	beq	c04e0b30 <diagchar_read+0x61c>
 c04e0a78:	e5982030 	ldr	r2, [r8, #48]	; 0x30 =====> r2 = sesstion_info->event_mask
 c04e0a7c:	e3520000 	cmp	r2, #0 ==============> sesstion_info->event_mask == 0?
 c04e0a80:	0a00002a 	beq	c04e0b30 <diagchar_read+0x61c>
 /code/kernel/msm-3.18/drivers/char/diag/diagchar_core.c:2932 (discriminator 1)
 c04e0a84:	e592a000 	ldr	sl, [r2]
 /code/kernel/msm-3.18/drivers/char/diag/diagchar_core.c:2931 (discriminator 1)
 c04e0a88:	e35a0000 	cmp	sl, #0
 c04e0a8c:	0a000027 	beq	c04e0b30 <diagchar_read+0x61c>
 
 | 
so, 难道是DDR出现了跳变?多半是硬件问题。
版权声明:本站所有文章均采用 CC BY-NC-SA 4.0 CN 许可协议。转载请注明原文链接!