如果 lifecycleScope 是主管，为什么它的子协程失败会导致应用程序崩溃？

Question

我是 Kotlin 协程的新手，正在尝试了解监督。正如文档所说：

A failure or cancellation of a child does not cause the supervisor job to fail and does not affect its other children.

好的，我已经为 JVM 编写了以下代码：

@JvmStatic
fun main(args: Array<String>) = runBlocking {
    val supervisorScope = CoroutineScope(Dispatchers.Default + SupervisorJob())

    // Coroutine #1
    supervisorScope.launch {
        println("Coroutine #1 start")
        delay(100)
        throw RuntimeException("Coroutine #1 failure")
    }

    // Coroutine #2
    supervisorScope.launch {
        for (i in 0 until 5) {
            println("Coroutine #2: $i")
            delay(100)
        }
    }

    supervisorScope.coroutineContext[Job]!!.children.forEach { it.join() }
}

这里一切正常，Coroutine #1失败既不影响父级，也不影响Coroutine #2。这就是监督的目的。输出与文档一致：

Coroutine #1 start
Coroutine #2: 0
Coroutine #2: 1
Exception in thread "DefaultDispatcher-worker-1" java.lang.RuntimeException: Coroutine #1 failure
    at supervisor.SupervisorJobUsage$main.invokeSuspend(SupervisorJobUsage.kt:16)
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
    at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:56)
    at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:561)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:727)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:667)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:655)
Coroutine #2: 2
Coroutine #2: 3
Coroutine #2: 4

Process finished with exit code 0

但后来我为 Android 编写了几乎相同的代码：

class CoroutineJobActivity : AppCompatActivity() {

    private val TAG = "CoroutineJobActivity"

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        testSupervisorScope()
    }

    private fun testSupervisorScope() {
        // Coroutine #1
        lifecycleScope.launch(Dispatchers.Default) {
            Log.d(TAG, "testSupervisorScope: Coroutine #1 start")
            delay(100)
            throw RuntimeException("Coroutine #1 failure")
        }

        // Coroutine #2
        lifecycleScope.launch(Dispatchers.Default) {
            for (i in 0 until 5) {
                Log.d(TAG, "testSupervisorScope: Coroutine #2: $i")
                delay(100)
            }
        }
    }
}

输出是意外的，因为 Coroutine #2 由于应用程序崩溃而没有完成它的工作。

testSupervisorScope: Coroutine #1 start
testSupervisorScope: Coroutine #2: 0
testSupervisorScope: Coroutine #2: 1
testSupervisorScope: Coroutine #2: 2
FATAL EXCEPTION: DefaultDispatcher-worker-2
    Process: jp.neechan.kotlin_coroutines_android, PID: 23561
    java.lang.RuntimeException: Coroutine #1 failure
        at jp.neechan.kotlin_coroutines_android.coroutinejob.CoroutineJobActivity$testSupervisorScope.invokeSuspend(CoroutineJobActivity.kt:25)
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
        at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:56)
        at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:561)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:727)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:667)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:655)

虽然lifecycleScope.coroutineContext是SupervisorJob() + Dispatchers.Main.immediate，这里我看到子协程的失败影响了parent和其他children

那么监督lifecycleScope的目的是什么？

Answer 1

问题是 SupervisorJob 没有像您预期的那样工作。 SupervisorScope 的想法是，当它的一个子项启动异常时，它不会取消其他子项的执行，但如果异常不是 CancellationException，它会将异常传播到系统如果你没有抓住它，应用程序就会崩溃。另一种管理异常的方法是将 CoroutineExceptionHandler 传递给范围，它必须管理子级发起的异常。

Answer 2

很少有东西在您的用例中发挥重要作用

Here everything is fine, Coroutine #1 failure doesn't affect nor the parent, neither the Coroutine #2. That's the purpose of supervision

CoroutineExceptionHandler vs Thread.uncaughtExceptionHandler

CoroutineExceptionHandler 是默认处理程序，一旦协程抛出异常，它将打印异常详细信息。使用 launch 和 join 将强制协程等待作业完成，这就是为什么您能够看到两个协程的输出。

现在，如果协程因 join 而崩溃，那么它将抛出 CancellationException

In particular, it means that a parent coroutine invoking join on a child coroutine that was started using launch(coroutineContext) { ... } builder throws CancellationException if the child had crashed, unless a non-standard CoroutineExceptionHandler is installed in the context.

CoroutineExceptionHandler without join：默认情况下，CoroutineExceptionHandler会忽略CancellationException，如果你不使用join那么它就赢了'打印任何东西。

CoroutineExceptionHandler with join ：如果您在协程上使用 join 那么构建器将抛出 CancellationException 并且由于作业尚未完成（其他协程仍在进行中）然后它将打印错误并继续其他作业。

supervisorScope.coroutineContext[Job]!!.children.forEach { it.join() }

遵循相同的行为定义 Exception propagation where GlobalScope 没有关联的 Job 对象。

在 Android 中，Thread.uncaughtExceptionHandler 是默认处理程序，如果出现未捕获的异常，它将终止应用程序并显示崩溃对话框。

这就是在不同生态系统中使用或不使用 join 处理异常的不同之处，因此您在使用 join 的 kotlin 测试中没有终止行为（这不在 android 应用程序中)

Although lifecycleScope.coroutineContext is SupervisorJob() + Dispatchers.Main.immediate, here I see that failure of child coroutine affected the parent and other children.

不，子进程不会影响父协程，因为根本没有子进程。您的两个协程都将在与各个父协程相同的线程上执行，并且没有父子关系（在协程中使用 Thread.currentThread()?.name 来查看线程名称）所以在异常情况下，父会将异常委托给 android 的 uncaughtExceptionHandler，这将终止应用程序（参考点 1）。

因此，您可以使用 withContext

lifecycleScope.launch(Dispatchers.Default) {
            for (i in 0 until 5) {
                Log.d(TAG, "testSupervisorScope: Coroutine #1: $i")
                delay(100)
            }

            try {
                // can use another context to change thread, e.g Dispatchers.IO
                withContext(lifecycleScope.coroutineContext) {
                    Log.d(TAG, "testSupervisorScope: Coroutine withContext start")
                    delay(100)
                    throw RuntimeException("Coroutine sub-task failure")
                }

            } catch (e: java.lang.RuntimeException) {
                e.printStackTrace()
            }
        }

或者为了建立父子关系，使用与

相同的作用域来调用子协程

   private fun testSupervisorScope() = runBlocking {
        // Coroutine #1
        lifecycleScope.launch(Dispatchers.Default) {
            for (i in 0 until 5) {
                Log.d(TAG, "testSupervisorScope: Coroutine #1: $i")
                delay(100)
            }


            // Coroutine child #1
            try {
                childCoroutineWithException().await()
            } catch (e: Exception) {
                Log.d(TAG, "caught exception")
                e.printStackTrace()
            }
        }
    }

    // Note: use same scope `lifecycleScope` to ceate child coroutine to establish parent-child relation
    fun childCoroutineWithException(): Deferred<String> = lifecycleScope.async {
        Log.d(TAG, "testSupervisorScope: Coroutine child #1 start")
        delay(100)
        throw RuntimeException("Coroutine child #1 failure")
    }

一旦建立父子关系，上面的代码就可以处理catch块中的异常，不会影响其他子协程的执行。

子协程的结果：

CoroutineJobActivity: testSupervisorScope: Coroutine #1: 1
CoroutineJobActivity: testSupervisorScope: Coroutine #1: 2
CoroutineJobActivity: testSupervisorScope: Coroutine #1: 3
CoroutineJobActivity: testSupervisorScope: Coroutine #1: 4
CoroutineJobActivity: testSupervisorScope: Coroutine #1: 5
CoroutineJobActivity: testSupervisorScope: Coroutine child #1 start
CoroutineJobActivity: Coroutine child #1 failure

您可以通过删除 runBlocking

进一步简化示例

private fun testSupervisorScope(){
    // Coroutine #1
    lifecycleScope.launch(Dispatchers.Default) {
        for (i in 0 until 5) {
            Log.d(TAG, "testSupervisorScope: Coroutine #1: $i")
            try {
                childCoroutineWithException().await()
            } catch (e: Exception) {
                Log.d(TAG, "caught exception")
                e.printStackTrace()
            }
            delay(100)
        }

    }
}

// Note: use same scope `lifecycleScope` to ceate child coroutine to establish parent-child relation
fun childCoroutineWithException(): Deferred<String> = lifecycleScope.async {
    Log.d(TAG, "testSupervisorScope: Coroutine child #1 start")
    delay(100)
    throw RuntimeException("Coroutine child #1 failure")
}

您可以为未捕获的异常实现自己的处理程序以避免应用程序崩溃（除非您确实需要它，否则不要这样做，因为这是不好的做法，会导致 Technical debt）。

需要处理未捕获的异常并发送日志文件

Answer 3

如果你仔细看看你的输出：

Exception in thread "DefaultDispatcher-worker-1" java.lang.RuntimeException: Coroutine #1 failure
    at supervisor.SupervisorJobUsage$main.invokeSuspend(SupervisorJobUsage.kt:16)
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
    at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:56)
    at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:561)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:727)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:667)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:655)

这是来自 JVM 级未捕获异常处理程序的报告。这意味着，即使它没有取消范围的工作，异常也会杀死 Java 线程。执行程序可以很容易地从此类错误中恢复，但 Android 使用不同的未捕获异常处理程序，该处理程序会立即终止整个应用程序。协程范围不会改变该行为。

您可以尝试查看此机制的一些代码：

GlobalScope.launch(Dispatchers.Default) {
    Thread.currentThread().setUncaughtExceptionHandler { thread, exception ->
        Log.e("MyTag", "We got an error on ${thread.name}: $exception")
    }
    throw RuntimeException("Dead")
}

如果我注释掉 setUncaughtExceptionHandler 调用，我会像您一样遇到应用程序崩溃。但是有了它，我就在日志中得到一行。

当然，您不会在生产环境中这么写，但是如果您向作用域添加协程异常处理程序，它会产生相同的效果。

尽管如此，整个故事对我来说意义不大，而且我认为异常处理总体上仍然是 Kotlin 协程中需要完善的领域。

如果 lifecycleScope 是主管，为什么它的子协程失败会导致应用程序崩溃？

If lifecycleScope is supervisor, why its child coroutine's failure causes the app crash?

android

kotlin

kotlin-coroutines