死机分析以前在R平台搞过,基本就是抓到死机时的CPU register等信息,然后用objdump反汇编出来结合源码定位分析,现在到了手机平台,多了个Tracer32,高通分析死机都在用,现在死机都挂我这了,老问高通也不是个事重拾下,我觉得可以不用trace32,基本还是那老一套。
先来看死机现场:
[ 1256.852648] Unable to handle kernel NULL pointer dereference at virtual address 00000004 [ 1256.931061] pgd = e2110000 [ 1256.933725] [00000004] *pgd=00000000 [ 1256.937725] Internal error: Oops: 5 [#1] PREEMPT SMP ARM [ 1256.942470] Modules linked in: wlan(O) [last unloaded: wlan] [ 1256.948098] CPU: 1 PID: 585 Comm: qti Tainted: G W O 3.18.71-perf-g24d2c84 #1 [ 1256.956092] task: e41161c0 ti: e2148000 task.ti: e2148000 [ 1256.961479] PC is at diagchar_read+0x610/0x11fc [ 1256.965978] LR is at 0x0 [ 1256.968488] pc : [<c04e0b24>] lr : [<00000000>] psr: 60010013 [ 1256.968488] sp : e2149ef0 ip : 00000051 fp : b1bb5b7c [ 1256.979944] r10: c6651000 r9 : 00000201 r8 : c5227bc0 [ 1256.985152] r7 : c13dbc38 r6 : 00000014 r5 : 000186a0 r4 : b1bb5b78 [ 1256.991676] r3 : 00000000 r2 : 80000000 r1 : 00000000 r0 : 00000000
|
内核空指针,一个关键信息是pc:c04e0b24。
objdump vmlinux出来后,基本-lD
就够用了,搜到pc:
/code/kernel/msm-3.18/drivers/char/diag/diagchar_core.c:2933 c04e0b1c: e3510000 cmp r1, #0 c04e0b20: 05983030 ldreq r3, [r8, #48] ; 0x30 c04e0b24: 05933004 ldreq r3, [r3, #4] ===============> 这里crash c04e0b28: 0a000028 beq c04e0bd0 <diagchar_read+0x6bc> c04e0b2c: ea0001b2 b c04e11fc <diagchar_read+0xce8> /code/kernel/msm-3.18/drivers/char/diag/diagchar_core.c:2937
|
找到源码2933行:
if (driver->data_ready[index] & EVENT_MASKS_TYPE) { data_type = driver->data_ready[index] & EVENT_MASKS_TYPE; session_info = diag_md_session_get_peripheral(APPS_DATA); COPY_USER_SPACE_OR_EXIT(buf, data_type, 4); 2931 if (session_info && session_info->event_mask && 2932 session_info->event_mask->ptr) { 2933 COPY_USER_SPACE_OR_EXIT(buf + sizeof(int), *(session_info->event_mask->ptr), session_info->event_mask->mask_len); } else { COPY_USER_SPACE_OR_EXIT(buf + sizeof(int), *(event_mask.ptr), event_mask.mask_len); } driver->data_ready[index] ^= EVENT_MASKS_TYPE; goto exit; }
|
2933行是:COPY_USER_SPACE_OR_EXIT(buf + sizeof(int),
,crash的地方再看下现场:
r8 : c5227bc0 r3 : 00000000
|
也就是说r3 = 0是触发这个死机的因, r3和r8看着应该和2934,2935有关,那到底是不是了,先看下偏移,struct里又是宏又是嵌套结构体,用gdb帮忙:
$ arm-eabi-gdb vmlinux GNU gdb (GDB) 7.6 Copyright (C) 2013 Free Software Foundation, Inc.
(gdb) p &((struct diag_md_session_t*)0)->event_mask $1 = (struct diag_mask_info **) 0x30
(gdb) p &((struct diag_mask_info*)0)->mask_len $2 = (int *) 0x4 <__vectors_start+4>
|
再往上看看就能得出:
r8=session_info r3=r8+48=session_info->event_mask r3=r3+4=session_info->event_mask->mask_len
|
r3=0触发,也就是session_info->event_mask是0?2931已经判断过了:
/code/kernel/msm-3.18/drivers/char/diag/diagchar_core.c:2931 (discriminator 1) c04e0a68: e3580000 cmp r8, #0 ==================> r8 = sesstion_info /work/buildfarm/jenkins/workspace/buildfarml_rmnj_10/kernel/msm-3.18/drivers/char/diag/diagchar_core.c:2930 (discriminator 1) c04e0a6c: e2833004 add r3, r3, #4 c04e0a70: e58d3024 str r3, [sp, #36] ; 0x24 /code/kernel/msm-3.18/drivers/char/diag/diagchar_core.c:2931 (discriminator 1) c04e0a74: 0a00002d beq c04e0b30 <diagchar_read+0x61c> c04e0a78: e5982030 ldr r2, [r8, #48] ; 0x30 =====> r2 = sesstion_info->event_mask c04e0a7c: e3520000 cmp r2, #0 ==============> sesstion_info->event_mask == 0? c04e0a80: 0a00002a beq c04e0b30 <diagchar_read+0x61c> /code/kernel/msm-3.18/drivers/char/diag/diagchar_core.c:2932 (discriminator 1) c04e0a84: e592a000 ldr sl, [r2] /code/kernel/msm-3.18/drivers/char/diag/diagchar_core.c:2931 (discriminator 1) c04e0a88: e35a0000 cmp sl, #0 c04e0a8c: 0a000027 beq c04e0b30 <diagchar_read+0x61c>
|
so, 难道是DDR出现了跳变?多半是硬件问题。
版权声明:本站所有文章均采用 CC BY-NC-SA 4.0 CN 许可协议。转载请注明原文链接!