Android 9.0启动中异常进入recovery界面,提示"Can't load Android system", 只有两个菜单选项try again + factory reset。

framework同事没看,那我来吧,还是先跟下代码。

bootable/recovery:

static bool prompt_and_wipe_data(Device* device) {
  // Use a single string and let ScreenRecoveryUI handles the wrapping.
  const char* const headers[] = {
    "Can't load Android system. Your data may be corrupt. "
    "If you continue to get this message, you may need to "
    "perform a factory data reset and erase all user data "
    "stored on this device.",
    nullptr
  };
  const char* const items[] = {
    "Try again",
    "Factory data reset",
    NULL 
  };
  for (;;) {
    int chosen_item = get_menu_selection(headers, items, true, 0, device);
    if (chosen_item != 1) { 
      return true;  // Just reboot, no wipe; not a failure, user asked for it
    }    
    if (ask_to_wipe_data(device)) {
      return wipe_data(device);
    }    
  }
}

framework/base/core/java/android/os/RecoverySystem.java:

   /** {@hide} */
    public static void rebootPromptAndWipeUserData(Context context, String reason)
            throws IOException {
        String reasonArg = null;
        if (!TextUtils.isEmpty(reason)) {
            reasonArg = "--reason=" + sanitizeArg(reason);
        }    

        final String localeArg = "--locale=" + Locale.getDefault().toString();
        bootCommand(context, null, "--prompt_and_wipe_data", reasonArg, localeArg);
    }

who call rebootPromptAndWipeUserData?

framework/base/services/core/java/com/android/server/RescueParty.java:

     private static void executeRescueLevelInternal(Context context, int level) throws Exception {
         switch (level) {
             case LEVEL_RESET_SETTINGS_UNTRUSTED_DEFAULTS:
                 resetAllSettings(context, Settings.RESET_MODE_UNTRUSTED_DEFAULTS);
                 break;
             case LEVEL_RESET_SETTINGS_UNTRUSTED_CHANGES:
                 resetAllSettings(context, Settings.RESET_MODE_UNTRUSTED_CHANGES);
                 break;
             case LEVEL_RESET_SETTINGS_TRUSTED_DEFAULTS:
                 resetAllSettings(context, Settings.RESET_MODE_TRUSTED_DEFAULTS);
                 break;
             case LEVEL_FACTORY_RESET:
                 RecoverySystem.rebootPromptAndWipeUserData(context, TAG); //tj: here
                 break;
         }
     }
     private static void executeRescueLevel(Context context) {
         final int level = SystemProperties.getInt(PROP_RESCUE_LEVEL, LEVEL_NONE);
         if (level == LEVEL_NONE) return;
 
         Slog.w(TAG, "Attempting rescue level " + levelToString(level));
         try {
             executeRescueLevelInternal(context, level);
             EventLogTags.writeRescueSuccess(level);
             logCriticalInfo(Log.DEBUG,
                     "Finished rescue level " + levelToString(level));
         } catch (Throwable t) {
             final String msg = ExceptionUtils.getCompleteMessage(t);
             EventLogTags.writeRescueFailure(level, msg);
             logCriticalInfo(Log.ERROR,
                     "Failed rescue level " + levelToString(level) + ": " + msg);
         }
     }

lets check log:

4353 10-02 15:52:52.477  2468  3711 W RescueParty: Noticed 5 events for UID 1001 in last 4 sec
4354 10-02 15:52:52.480  2468  3711 W PackageManager: Incremented rescue level to FACTORY_RESET triggered by UID 1001
4355 10-02 15:52:52.481  2468  3711 W RescueParty: Attempting rescue level FACTORY_RESET

ok, 到这里应该知道了是UID 1001触发的。UID 1001定义:

#define AID_RADIO 1001           /* telephony subsystem, RIL */
xxx:/ # ps -A | grep ril
radio         2313     1  124168  22896 binder_thread_read  0 S qcrild
radio         3534  2012 3699156  79284 SyS_epoll_wait      0 S com.qualcomm.qcr
ilmsgtunnel
xxx:/ # id radio
uid=1001(radio) gid=1001(radio) groups=1001(radio), context=u:r:su:s0
xxx:/ # 

那UID 1001为什么要触发了?继续根据log看。

    private static void incrementRescueLevel(int triggerUid) {
        final int level = MathUtils.constrain(
                SystemProperties.getInt(PROP_RESCUE_LEVEL, LEVEL_NONE) + 1,
                LEVEL_NONE, LEVEL_FACTORY_RESET);
        SystemProperties.set(PROP_RESCUE_LEVEL, Integer.toString(level));

        EventLogTags.writeRescueLevel(level, triggerUid);
        logCriticalInfo(Log.WARN, "Incremented rescue level to "
                + levelToString(level) + " triggered by UID " + triggerUid);
    }  
    /** 
     * Take note of a boot event. If we notice too many of these events
     * happening in rapid succession, we'll send out a rescue party.
     */
    public static void noteBoot(Context context) {
        if (isDisabled()) return;
        if (sBoot.incrementAndTest()) {
            sBoot.reset();
            incrementRescueLevel(sBoot.uid);
            executeRescueLevel(context);
        }   
    }   

    /** 
     * Take note of a persistent app crash. If we notice too many of these
     * events happening in rapid succession, we'll send out a rescue party.
     */
    public static void notePersistentAppCrash(Context context, int uid) {
        if (isDisabled()) return;
        Threshold t = sApps.get(uid);
        if (t == null) {
            t = new AppThreshold(uid);
            sApps.put(uid, t); 
        }   
        if (t.incrementAndTest()) {
            t.reset();
            incrementRescueLevel(t.uid);
            executeRescueLevel(context);
        }   
    }  

这里是noteBoot还是notePersistentAppCrash了,看log:

10-02 15:52:51.653  2468  3661 W ActivityManager: Process com.gsma.rcs has crashed too many times: killing!
10-02 15:52:51.776  2468  2478 I ActivityManager: Process com.gsma.rcs (pid 8381) has died: pers PER
4340 10-02 15:52:51.777  2468  2478 W ActivityManager: Scheduling restart of crashed service com.gsma.rcs/.service.RcsCoreService in 0ms
10-02 15:52:51.777  2468  2478 W ActivityManager: Scheduling restart of crashed service com.gsma.rcs/.service.StartService in 0ms
10-02 15:52:51.777  2468  2478 W ActivityManager: Re-adding persistent process ProcessRecord{dd29265 8381:com.gsma.rcs/1001}
10-02 15:52:51.823  2468  2484 I ActivityManager: Start proc 8432:com.gsma.rcs/1001 for restart com.gsma.rcs
xxx:/ # ps -A | grep rcs
system        2009     1   18592   5680 binder_thread_read  0 S imsrcsd
radio         3743  2012 4363308  66952 SyS_epoll_wait      0 S com.gsma.rcs
xxx:/ #
xxx:/ # id radio
uid=1001(radio) gid=1001(radio) groups=1001(radio), context=u:r:su:s0
xxx:/ #

so, it's com.gsma.rcs triggered, need check gsma.rcs crash stack, forward to app check...

所以,也不一定恢复出厂就能解决这类问题。为啥不把notes写到recovery文字菜单界面? 一看就知道原因了啊。

by the way, eng/userdebug+usb connect不会出现这个画面。

    void crashApplicationInner(ProcessRecord r, ApplicationErrorReport.CrashInfo crashInfo,
            int callingPid, int callingUid) {
        long timeMillis = System.currentTimeMillis();
        String shortMsg = crashInfo.exceptionClassName;
        String longMsg = crashInfo.exceptionMessage;
        String stackTrace = crashInfo.stackTrace;
        if (shortMsg != null && longMsg != null) {
            longMsg = shortMsg + ": " + longMsg;
        } else if (shortMsg != null) {
            longMsg = shortMsg;
        }    

        // If a persistent app is stuck in a crash loop, the device isn't very
        // usable, so we want to consider sending out a rescue party.
        if (r != null && r.persistent) {
            RescueParty.notePersistentAppCrash(mContext, r.uid);
        } 

    private static boolean isDisabled() {
        // Check if we're explicitly enabled for testing
        if (SystemProperties.getBoolean(PROP_ENABLE_RESCUE, false)) {
            return false;
        }   

        // We're disabled on all engineering devices
        if (Build.IS_ENG) {
            Slog.v(TAG, "Disabled because of eng build");
            return true;
        }   

        // We're disabled on userdebug devices connected over USB, since that's
        // a decent signal that someone is actively trying to debug the device,
        // or that it's in a lab environment.
        if (Build.IS_USERDEBUG && isUsbActive()) {
            Slog.v(TAG, "Disabled because of active USB connection");
            return true;
        }   

        // One last-ditch check
        if (SystemProperties.getBoolean(PROP_DISABLE_RESCUE, false)) {
            Slog.v(TAG, "Disabled because of manual property");
            return true;
        }   

        return false;
    }  

另外对Google A/B分区使能的系统,cache分区是被删掉的,而recovery会把他的log存在cache分区,目前原生recovery并不支持。