Android崩潰原理和優(yōu)化

一、Java Crash處理

1、在Thread類中有這樣一個(gè)接口:UncaughtExceptionHandler。

通過查看相關(guān)注釋可以知道:當(dāng)線程由于未捕獲的異常突然終止時(shí),JVM會(huì)通過getUnaughtExceptionHandler查詢線程的UnaughtExceptionHandler,并調(diào)用它的uncaughtException方法。如果未設(shè)置UncaughtExceptionHandler,系統(tǒng)會(huì)用ThreadGroup進(jìn)行處理。

/**
 * Interface for handlers invoked when a <tt>Thread</tt> abruptly
 * terminates due to an uncaught exception.
 * <p>When a thread is about to terminate due to an uncaught exception
 * the Java Virtual Machine will query the thread for its
 * <tt>UncaughtExceptionHandler</tt> using
 * {@link #getUncaughtExceptionHandler} and will invoke the handler's
 * <tt>uncaughtException</tt> method, passing the thread and the
 * exception as arguments.
 * If a thread has not had its <tt>UncaughtExceptionHandler</tt>
 * explicitly set, then its <tt>ThreadGroup</tt> object acts as its
 * <tt>UncaughtExceptionHandler</tt>. If the <tt>ThreadGroup</tt> object
 * has no
 * special requirements for dealing with the exception, it can forward
 * the invocation to the {@linkplain #getDefaultUncaughtExceptionHandler
 * default uncaught exception handler}.
 */
@FunctionalInterface
public interface UncaughtExceptionHandler {
    /**
     * Method invoked when the given thread terminates due to the
     * given uncaught exception.
     * <p>Any exception thrown by this method will be ignored by the
     * Java Virtual Machine.
     * @param t the thread
     * @param e the exception
     */
    void uncaughtException(Thread t, Throwable e);
}

查看ThreadGroup的uncaughtException,它會(huì)查詢線程設(shè)置的UnaughtExceptionHandler,如果沒有的話,只是進(jìn)行打印處理,并沒有退出操作。說明一定有其他地方對(duì)Thread設(shè)置了UnaughtExceptionHandler。

/**
 * Called by the Java Virtual Machine when a thread in this
 * thread group stops because of an uncaught exception, and the thread
 * does not have a specific {@link Thread.UncaughtExceptionHandler}
 * installed.
 * <p>
 * The <code>uncaughtException</code> method of
 * <code>ThreadGroup</code> does the following:
 * <ul>
 * <li>If this thread group has a parent thread group, the
 *     <code>uncaughtException</code> method of that parent is called
 *     with the same two arguments.
 * <li>Otherwise, this method checks to see if there is a
 *     {@linkplain Thread#getDefaultUncaughtExceptionHandler default
 *     uncaught exception handler} installed, and if so, its
 *     <code>uncaughtException</code> method is called with the same
 *     two arguments.
 * <li>Otherwise, this method determines if the <code>Throwable</code>
 *     argument is an instance of {@link ThreadDeath}. If so, nothing
 *     special is done. Otherwise, a message containing the
 *     thread's name, as returned from the thread's {@link
 *     Thread#getName getName} method, and a stack backtrace,
 *     using the <code>Throwable</code>'s {@link
 *     Throwable#printStackTrace printStackTrace} method, is
 *     printed to the {@linkplain System#err standard error stream}.
 * </ul>
 * <p>
 * Applications can override this method in subclasses of
 * <code>ThreadGroup</code> to provide alternative handling of
 * uncaught exceptions.
 *
 * @param   t   the thread that is about to exit.
 * @param   e   the uncaught exception.
 * @since   JDK1.0
 */
public void uncaughtException(Thread t, Throwable e) {
    if (parent != null) {
        parent.uncaughtException(t, e);
    } else {
        Thread.UncaughtExceptionHandler ueh =
            Thread.getDefaultUncaughtExceptionHandler();
        if (ueh != null) {
            ueh.uncaughtException(t, e);
        } else if (!(e instanceof ThreadDeath)) {
            System.err.print("Exception in thread \""
                             + t.getName() + "\" ");
            e.printStackTrace(System.err);
        }
    }
}
2、Thread的UncaughtExceptionHandler何時(shí)設(shè)置的?

通過AMS-Activity啟動(dòng)流程,我們可以知道App啟動(dòng)大概要經(jīng)歷以下步驟:

Android開機(jī)流程

在RuntimeInit.commonInit()方法中,會(huì)通過Thread.setDefaultUncaughtExceptionHandler(new KillApplicationHandler(loggingHandler)) 設(shè)置異常處理的handler。

protected static final void commonInit() {
    if (DEBUG) Slog.d(TAG, "Entered RuntimeInit!");
    /*
     * set handlers; these apply to all threads in the VM. Apps can replace
     * the default handler, but not the pre handler.
     */
    LoggingHandler loggingHandler = new LoggingHandler();
    RuntimeHooks.setUncaughtExceptionPreHandler(loggingHandler);
    Thread.setDefaultUncaughtExceptionHandler(new KillApplicationHandler(loggingHandler));
    /*
     * Install a time zone supplier that uses the Android persistent time zone system property.
     */
    RuntimeHooks.setTimeZoneIdSupplier(() -> SystemProperties.get("persist.sys.timezone"));
    /*
     * Sets handler for java.util.logging to use Android log facilities.
     * The odd "new instance-and-then-throw-away" is a mirror of how
     * the "java.util.logging.config.class" system property works. We
     * can't use the system property here since the logger has almost
     * certainly already been initialized.
     */
    LogManager.getLogManager().reset();
    new AndroidConfig();
    /*
     * Sets the default HTTP User-Agent used by HttpURLConnection.
     */
    String userAgent = getDefaultUserAgent();
    System.setProperty("http.agent", userAgent);
    /*
     * Wire socket tagging to traffic stats.
     */
    NetworkManagementSocketTagger.install();
    initialized = true;
}
3、崩潰的源頭:KillApplicationHandler

查看源碼可知,在finally中,KillApplicationHandler主動(dòng)殺死了進(jìn)程。

private static class KillApplicationHandler implements Thread.UncaughtExceptionHandler {
    public void uncaughtException(Thread t, Throwable e) {
        try {
            ensureLogging(t, e);
            if (mCrashing) return;
            mCrashing = true;
            if (ActivityThread.currentActivityThread() != null) {
                ActivityThread.currentActivityThread().stopProfiling();
            }
            ActivityManager.getService().handleApplicationCrash(
                    mApplicationObject, new ApplicationErrorReport.ParcelableCrashInfo(e));
        } catch (Throwable t2) {
            ...
        } finally {
            // Try everything to make sure this process goes away.
            Process.killProcess(Process.myPid());
            System.exit(10);
        }
    }
}
4、KillApplicationHandler中的其他操作

在uncaughtException中,通過AMS.handleApplicationCrash()做了進(jìn)一步處理。通過addErrorToDropBox()在系統(tǒng)中記錄日志,可以記錄 java crash、native crash、anr等,日志目錄是:/data/system/dropbox 。

public void handleApplicationCrash(IBinder app,
        ApplicationErrorReport.ParcelableCrashInfo crashInfo) {
    ProcessRecord r = findAppProcess(app, "Crash");
    final String processName = app == null ? "system_server"
            : (r == null ? "unknown" : r.processName);
    handleApplicationCrashInner("crash", r, processName, crashInfo);
}

void handleApplicationCrashInner(String eventType, ProcessRecord r, String processName,
        ApplicationErrorReport.CrashInfo crashInfo) {
    ...
    addErrorToDropBox(
            eventType, r, processName, null, null, null, null, null, null, crashInfo,
            new Float(loadingProgress), incrementalMetrics, null);
    mAppErrors.crashApplication(r, crashInfo);
}
5、Android 處理Java Crash的調(diào)用流程
未捕獲的異常 -> JVM 觸發(fā)調(diào)用 ->
KillApplicationHandler.uncaughtException {
    try {
        ActivityManager.getService().handleApplicationCrash();  // 交給AMS處理
    } finally { // 退出App進(jìn)程
        Process.killProcess(Process.myPid());
        System.exit(10);
    }
}
    -> AMS.handleApplicationCrash
    -> AMS.handleApplicationCrashInner {
        addErrorToDropBox(); // 系統(tǒng)記錄崩潰日志
        mAppErrors.crashApplication();
    }
        -> AppErrors.crashApplication
        -> AppErrors.crashApplicationInner {
            // 處理crash
            if (!makeAppCrashingLocked()){
                return;
            }

            // 展示崩潰彈窗
            final Message msg = Message.obtain();
            msg.what = ActivityManagerService.SHOW_ERROR_UI_MSG;
            mService.mUiHandler.sendMessage(msg);

            // 處理彈窗結(jié)果,重啟、退出等
            int res = result.get(); // 阻塞
            switch (res) {}
        }

二、native crash處理

native crash處理流程
1、java層監(jiān)聽

Binder(五)服務(wù)注冊(cè)流程-發(fā)送注冊(cè)請(qǐng)求可知:
手機(jī)開機(jī)后會(huì)啟動(dòng)system_server進(jìn)程,然后調(diào)用SystemServer的main方法,在main方法中通過startBootstrapServices啟動(dòng)AMS。之后通過startOtherServices方法調(diào)用AMS的systemReady ,在systemReady的回調(diào)中,會(huì)通過 mActivityManagerService.startObservingNativeCrashes() 注冊(cè) native crash 的監(jiān)聽。

在NativeCrashListener的run方法中,開啟了socket監(jiān)聽。

public void startObservingNativeCrashes() {
    final NativeCrashListener ncl = new NativeCrashListener(this);
    ncl.start();
}

final class NativeCrashListener extends Thread {
    public void run() {
        final byte[] ackSignal = new byte[1];
        ...
        try {
            FileDescriptor serverFd = Os.socket(AF_UNIX, SOCK_STREAM, 0);
            final UnixSocketAddress sockAddr = UnixSocketAddress.createFileSystem(DEBUGGERD_SOCKET_PATH);
            Os.bind(serverFd, sockAddr);
            Os.listen(serverFd, 1);
            Os.chmod(DEBUGGERD_SOCKET_PATH, 0777);
            while (true) {
                FileDescriptor peerFd = null;
                try {
                    peerFd = Os.accept(serverFd, null /* peerAddress */);
                    if (peerFd != null) {
                        consumeNativeCrashData(peerFd);
                    }
                } catch (Exception e) {
                    ...
                } finally {
                    ...
                }
            }
        } catch (Exception e) {
            ...
        }
    }
}
2、native上報(bào)

native程序是動(dòng)態(tài)鏈接程序,需要鏈接器才能跑起來,liner就是Android的鏈接器,查看linker_main.cpp。經(jīng)過一系列調(diào)用 _linker_init -> _linker_init_post_relocation -> debuggerd_init 進(jìn)入debuggerd_handler.cpp的debuggerd_init方法中。

/* This is the entry point for the linker, called from begin.S. This
 * method is responsible for fixing the linker's own relocations, and
 * then calling __linker_init_post_relocation().
 */
extern "C" ElfW(Addr) __linker_init(void* raw_args) {
    ...
    ElfW(Addr) start_address = __linker_init_post_relocation(args);
    return start_address;
}

static ElfW(Addr) __linker_init_post_relocation(KernelArgumentBlock& args) {
#ifdef __ANDROID__
    debuggerd_callbacks_t callbacks = {
        .get_abort_message = []() {
        return g_abort_message;
        },
        .post_dump = &notify_gdb_of_libraries,
    };
    debuggerd_init(&callbacks);
#endif
}

在debuggerd_init方法中,注冊(cè)了用于處理signal的debuggerd_signal_handler。

void debuggerd_init(debuggerd_callbacks_t* callbacks) {
    ...
    struct sigaction action;
    memset(&action, 0, sizeof(action));
    sigfillset(&action.sa_mask);
    action.sa_sigaction = debuggerd_signal_handler;
    action.sa_flags = SA_RESTART | SA_SIGINFO;

    // Use the alternate signal stack if available so we can catch stack overflows.
    action.sa_flags |= SA_ONSTACK;
    debuggerd_register_handlers(&action);
}

// /system/core/debuggerd/include/debuggerd/handler.h
static void __attribute__((__unused__)) debuggerd_register_handlers(struct sigaction* action) {
    sigaction(SIGABRT, action, nullptr);
    sigaction(SIGBUS, action, nullptr);
    sigaction(SIGFPE, action, nullptr);
    sigaction(SIGILL, action, nullptr);
    sigaction(SIGSEGV, action, nullptr);
#if defined(SIGSTKFLT)
    sigaction(SIGSTKFLT, action, nullptr);
#endif
    sigaction(SIGSYS, action, nullptr);
    sigaction(SIGTRAP, action, nullptr);
    sigaction(DEBUGGER_SIGNAL, action, nullptr);
}

在debuggerd_signal_handler中,會(huì)通過clone子線程啟動(dòng)crashdump,用于記錄崩潰日志,等子線程執(zhí)行完畢后,通過resend_signal kill掉當(dāng)前進(jìn)程。

static void debuggerd_signal_handler(int signal_number, siginfo_t* info, void* context) {
  ...
  // clone子線程啟動(dòng)crashdump
  pid_t child_pid =
    clone(debuggerd_dispatch_pseudothread, pseudothread_stack,
          CLONE_THREAD | CLONE_SIGHAND | CLONE_VM | CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID,
          &thread_info, nullptr, nullptr, &thread_info.pseudothread_tid);
  if (child_pid == -1) {
    fatal_errno("failed to spawn debuggerd dispatch thread");
  }

  // 等待子線程啟動(dòng)
  futex_wait(&thread_info.pseudothread_tid, -1);

  // 等待子線程執(zhí)行完畢
  futex_wait(&thread_info.pseudothread_tid, child_pid);

  ...
  if (info->si_signo == DEBUGGER_SIGNAL) {
    ...
  } else {
    // 重新發(fā)送信號(hào)
    resend_signal(info);
  }
}

static void resend_signal(siginfo_t* info) {
  // Signals can either be fatal or nonfatal.
  // For fatal signals, crash_dump will send us the signal we crashed with
  // before resuming us, so that processes using waitpid on us will see that we
  // exited with the correct exit status (e.g. so that sh will report
  // "Segmentation fault" instead of "Killed"). For this to work, we need
  // to deregister our signal handler for that signal before continuing.
  if (info->si_signo != DEBUGGER_SIGNAL) {
    signal(info->si_signo, SIG_DFL); // 設(shè)置成系統(tǒng)默認(rèn)處理,會(huì)kill掉當(dāng)前進(jìn)程
    int rc = syscall(SYS_rt_tgsigqueueinfo, __getpid(), __gettid(), info->si_signo, info);
    if (rc != 0) {
      fatal_errno("failed to resend signal during crash");
    }
  }
}

在crash_dump的main方法中,fork子進(jìn)程與tombstoned通信,記錄crash日志;并通知AMS native crash。

// /system/core/debuggerd/crash_dump.cpp
int main(int argc, char** argv) {
  ...
  // fork子進(jìn)程
  pid_t forkpid = fork();
  if (forkpid == -1) {
    PLOG(FATAL) << "fork failed";
  } else if (forkpid == 0) {
    fork_exit_read.reset();
  } else {
    // 等待子進(jìn)程處理完畢
    fork_exit_write.reset();
    char buf;
    TEMP_FAILURE_RETRY(read(fork_exit_read.get(), &buf, sizeof(buf)));
    _exit(0);
  }
  
  ...
  // 連接tombstoned,輸出日志
  {
    ATRACE_NAME("tombstoned_connect");
    LOG(INFO) << "obtaining output fd from tombstoned, type: " << dump_type;
    g_tombstoned_connected =
        tombstoned_connect(g_target_thread, &g_tombstoned_socket, &g_output_fd, dump_type);
  }

  if (g_tombstoned_connected) {
    if (TEMP_FAILURE_RETRY(dup2(g_output_fd.get(), STDOUT_FILENO)) == -1) {
      PLOG(ERROR) << "failed to dup2 output fd (" << g_output_fd.get() << ") to STDOUT_FILENO";
    }
  } else {
    unique_fd devnull(TEMP_FAILURE_RETRY(open("/dev/null", O_RDWR)));
    TEMP_FAILURE_RETRY(dup2(devnull.get(), STDOUT_FILENO));
    g_output_fd = std::move(devnull);
  }

  ...
  // 通知AMS
  if (fatal_signal) {
    // Don't try to notify ActivityManager if it just crashed, or we might hang until timeout.
    if (thread_info[target_process].thread_name != "system_server") {
      activity_manager_notify(target_process, signo, amfd_data);
    }
  }

  ...
  // 通知tombstoned處理完畢
  if (g_tombstoned_connected && !tombstoned_notify_completion(g_tombstoned_socket.get())) {
    LOG(ERROR) << "failed to notify tombstoned of completion";
  }

  return 0;
}

三、崩潰優(yōu)化(java層)

1、記錄日志信息:

記錄手機(jī)信息、內(nèi)存信息、Crash日志、屏幕截圖等

2、讓崩潰更友好一些:

系統(tǒng)崩潰會(huì)直接閃退,可以通過自定義handler進(jìn)行處理,重啟App頁面,減少直接退出App的場景。
需要注意的是,重啟app時(shí),需要退出原來的進(jìn)程,防止出現(xiàn)其它問題。

Intent intent = new Intent(BaseApplication.this, MainActivity.class);
intent.addFlags(Intent.FLAG_ACTIVITY_NEW_TASK
        | Intent.FLAG_ACTIVITY_CLEAR_TASK |
        Intent.FLAG_ACTIVITY_RESET_TASK_IF_NEEDED);
if (intent.getComponent() != null) {
    // 模擬從Launcher啟動(dòng)
    intent.setAction(Intent.ACTION_MAIN);
    intent.addCategory(Intent.CATEGORY_LAUNCHER);
}
BaseApplication.this.startActivity(intent);
android.os.Process.killProcess(android.os.Process.myPid());
System.exit(10);
3、不崩潰:

在crash過程中通過在主線程中重啟looper,防止App崩潰。

原理:系統(tǒng)出現(xiàn)未捕捉的異常后,會(huì)將異常一層層向上拋,我們知道主線程開啟了looper循環(huán),異常會(huì)導(dǎo)致循環(huán)退出,最終通過jvm調(diào)用到uncaughtException()方法。此時(shí)在主線程中通過Looper.loop()重啟loop,即可繼續(xù)處理App中的各種事件。

注意:當(dāng)在Activity展示過程中crash時(shí),系統(tǒng)會(huì)出現(xiàn)黑屏。 可以通過hook替換ActivityThread.mH.mCallback,對(duì)Activity的生命周期進(jìn)行try catch,如果有異常的話,直接關(guān)閉準(zhǔn)備顯示的Activity。

public class CrashHandler implements Thread.UncaughtExceptionHandler {
    @Override
    public void uncaughtException(@NonNull Thread thread, @NonNull Throwable ex) {
        handleExceptionReocrd(ex); // 自動(dòng)記錄日志
        try { // 交給用戶記錄日志
            if (listener != null) listener.recordException(ex);
        } catch (Throwable e) {
            e.printStackTrace();
        }

        try { // 是否重啟APP,重啟APP,需要?dú)⒌暨M(jìn)程
            if (listener != null && listener.restartApp()) return;
        } catch (Exception e) {
            Log.d(TAG, "uncaughtException->handleByUser:" + Log.getStackTraceString(e));
        }

        // 未重啟,是否開啟安全模式
        if (safeModelEnable) {
            enterSafeModel(thread);
        } else if (mDefaultHandler != null) {
            // 交給系統(tǒng)處理
            Log.d(TAG, "uncaughtException 交給系統(tǒng)處理");
            mDefaultHandler.uncaughtException(thread, ex);
        } else {
            // 沒有系統(tǒng)的處理器,直接退出進(jìn)程
            Log.w(TAG, "uncaughtException 退出進(jìn)程");
            android.os.Process.killProcess(android.os.Process.myPid());
            System.exit(10);
        }
    }

    public void enterSafeModel(Thread thread) {
        Log.w(CrashHandler.TAG, "setSafe--- thread-----" + thread.getName());
        if (thread == Looper.getMainLooper().getThread()) {
            while (true) { //開啟一個(gè)循環(huán)
                try {
                    Log.e(TAG, "safeMode: 檢測到異常退出,開啟looper");
                    Looper.loop();
                } catch (Throwable e) {
                    Log.e(TAG, "safeMode: 檢測到異常退出:" + Log.getStackTraceString(e));
                }
            }
        }
    }
}
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容