Android 9.0启动中异常进入recovery界面,提示”Can’t load Android system”, 只有两个菜单选项try again + factory reset。

framework同事没看,那我来吧,还是先跟下代码。

bootable/recovery:

static bool prompt_and_wipe_data(Device* device) {
// Use a single string and let ScreenRecoveryUI handles the wrapping.
const char* const headers[] = {
"Can't load Android system. Your data may be corrupt. "
"If you continue to get this message, you may need to "
"perform a factory data reset and erase all user data "
"stored on this device.",
nullptr
};
const char* const items[] = {
"Try again",
"Factory data reset",
NULL
};
for (;;) {
int chosen_item = get_menu_selection(headers, items, true, 0, device);
if (chosen_item != 1) {
return true; // Just reboot, no wipe; not a failure, user asked for it
}
if (ask_to_wipe_data(device)) {
return wipe_data(device);
}
}
}

framework/base/core/java/android/os/RecoverySystem.java:

/** {@hide} */
public static void rebootPromptAndWipeUserData(Context context, String reason)
throws IOException {
String reasonArg = null;
if (!TextUtils.isEmpty(reason)) {
reasonArg = "--reason=" + sanitizeArg(reason);
}

final String localeArg = "--locale=" + Locale.getDefault().toString();
bootCommand(context, null, "--prompt_and_wipe_data", reasonArg, localeArg);
}

who call rebootPromptAndWipeUserData?

framework/base/services/core/java/com/android/server/RescueParty.java:

private static void executeRescueLevelInternal(Context context, int level) throws Exception {
switch (level) {
case LEVEL_RESET_SETTINGS_UNTRUSTED_DEFAULTS:
resetAllSettings(context, Settings.RESET_MODE_UNTRUSTED_DEFAULTS);
break;
case LEVEL_RESET_SETTINGS_UNTRUSTED_CHANGES:
resetAllSettings(context, Settings.RESET_MODE_UNTRUSTED_CHANGES);
break;
case LEVEL_RESET_SETTINGS_TRUSTED_DEFAULTS:
resetAllSettings(context, Settings.RESET_MODE_TRUSTED_DEFAULTS);
break;
case LEVEL_FACTORY_RESET:
RecoverySystem.rebootPromptAndWipeUserData(context, TAG); //tj: here
break;
}
}
private static void executeRescueLevel(Context context) {
final int level = SystemProperties.getInt(PROP_RESCUE_LEVEL, LEVEL_NONE);
if (level == LEVEL_NONE) return;

Slog.w(TAG, "Attempting rescue level " + levelToString(level));
try {
executeRescueLevelInternal(context, level);
EventLogTags.writeRescueSuccess(level);
logCriticalInfo(Log.DEBUG,
"Finished rescue level " + levelToString(level));
} catch (Throwable t) {
final String msg = ExceptionUtils.getCompleteMessage(t);
EventLogTags.writeRescueFailure(level, msg);
logCriticalInfo(Log.ERROR,
"Failed rescue level " + levelToString(level) + ": " + msg);
}
}

lets check log:

4353 10-02 15:52:52.477  2468  3711 W RescueParty: Noticed 5 events for UID 1001 in last 4 sec
4354 10-02 15:52:52.480 2468 3711 W PackageManager: Incremented rescue level to FACTORY_RESET triggered by UID 1001
4355 10-02 15:52:52.481 2468 3711 W RescueParty: Attempting rescue level FACTORY_RESET

ok, 到这里应该知道了是UID 1001触发的。UID 1001定义:

#define AID_RADIO 1001           /* telephony subsystem, RIL */
xxx:/ # ps -A | grep ril
radio 2313 1 124168 22896 binder_thread_read 0 S qcrild
radio 3534 2012 3699156 79284 SyS_epoll_wait 0 S com.qualcomm.qcr
ilmsgtunnel
xxx:/ # id radio
uid=1001(radio) gid=1001(radio) groups=1001(radio), context=u:r:su:s0
xxx:/ #

那UID 1001为什么要触发了?继续根据log看。

private static void incrementRescueLevel(int triggerUid) {
final int level = MathUtils.constrain(
SystemProperties.getInt(PROP_RESCUE_LEVEL, LEVEL_NONE) + 1,
LEVEL_NONE, LEVEL_FACTORY_RESET);
SystemProperties.set(PROP_RESCUE_LEVEL, Integer.toString(level));

EventLogTags.writeRescueLevel(level, triggerUid);
logCriticalInfo(Log.WARN, "Incremented rescue level to "
+ levelToString(level) + " triggered by UID " + triggerUid);
}
/** 
* Take note of a boot event. If we notice too many of these events
* happening in rapid succession, we'll send out a rescue party.
*/
public static void noteBoot(Context context) {
if (isDisabled()) return;
if (sBoot.incrementAndTest()) {
sBoot.reset();
incrementRescueLevel(sBoot.uid);
executeRescueLevel(context);
}
}

/**
* Take note of a persistent app crash. If we notice too many of these
* events happening in rapid succession, we'll send out a rescue party.
*/
public static void notePersistentAppCrash(Context context, int uid) {
if (isDisabled()) return;
Threshold t = sApps.get(uid);
if (t == null) {
t = new AppThreshold(uid);
sApps.put(uid, t);
}
if (t.incrementAndTest()) {
t.reset();
incrementRescueLevel(t.uid);
executeRescueLevel(context);
}
}

这里是noteBoot还是notePersistentAppCrash了,看log:

10-02 15:52:51.653  2468  3661 W ActivityManager: Process com.gsma.rcs has crashed too many times: killing!
10-02 15:52:51.776 2468 2478 I ActivityManager: Process com.gsma.rcs (pid 8381) has died: pers PER
4340 10-02 15:52:51.777 2468 2478 W ActivityManager: Scheduling restart of crashed service com.gsma.rcs/.service.RcsCoreService in 0ms
10-02 15:52:51.777 2468 2478 W ActivityManager: Scheduling restart of crashed service com.gsma.rcs/.service.StartService in 0ms
10-02 15:52:51.777 2468 2478 W ActivityManager: Re-adding persistent process ProcessRecord{dd29265 8381:com.gsma.rcs/1001}
10-02 15:52:51.823 2468 2484 I ActivityManager: Start proc 8432:com.gsma.rcs/1001 for restart com.gsma.rcs
xxx:/ # ps -A | grep rcs
system 2009 1 18592 5680 binder_thread_read 0 S imsrcsd
radio 3743 2012 4363308 66952 SyS_epoll_wait 0 S com.gsma.rcs
xxx:/ #
xxx:/ # id radio
uid=1001(radio) gid=1001(radio) groups=1001(radio), context=u:r:su:s0
xxx:/ #

so, it’s com.gsma.rcs triggered, need check gsma.rcs crash stack, forward to app check…

所以,也不一定恢复出厂就能解决这类问题。为啥不把notes写到recovery文字菜单界面? 一看就知道原因了啊。

by the way, eng/userdebug+usb connect不会出现这个画面。

    void crashApplicationInner(ProcessRecord r, ApplicationErrorReport.CrashInfo crashInfo,
            int callingPid, int callingUid) {
        long timeMillis = System.currentTimeMillis();
        String shortMsg = crashInfo.exceptionClassName;
        String longMsg = crashInfo.exceptionMessage;
        String stackTrace = crashInfo.stackTrace;
        if (shortMsg != null && longMsg != null) {
            longMsg = shortMsg + ": " + longMsg;
        } else if (shortMsg != null) {
            longMsg = shortMsg;
        }    

        // If a persistent app is stuck in a crash loop, the device isn't very
        // usable, so we want to consider sending out a rescue party.
        if (r != null && r.persistent) {
            RescueParty.notePersistentAppCrash(mContext, r.uid);
        } 

    private static boolean isDisabled() {
        // Check if we're explicitly enabled for testing
        if (SystemProperties.getBoolean(PROP_ENABLE_RESCUE, false)) {
            return false;
        }   

        // We're disabled on all engineering devices
        if (Build.IS_ENG) {
            Slog.v(TAG, "Disabled because of eng build");
            return true;
        }   

        // We're disabled on userdebug devices connected over USB, since that's
        // a decent signal that someone is actively trying to debug the device,
        // or that it's in a lab environment.
        if (Build.IS_USERDEBUG && isUsbActive()) {
            Slog.v(TAG, "Disabled because of active USB connection");
            return true;
        }   

        // One last-ditch check
        if (SystemProperties.getBoolean(PROP_DISABLE_RESCUE, false)) {
            Slog.v(TAG, "Disabled because of manual property");
            return true;
        }   

        return false;
    }  

另外对Google A/B分区使能的系统,cache分区是被删掉的,而recovery会把他的log存在cache分区,目前原生recovery并不支持。