一个有用的指标，用于确定 JVM 何时会陷入 memory/GC 麻烦

Question

我有一个 scala 数据处理应用程序，95% 的时间都可以处理内存中抛给它的数据。剩下的 5% 如果不加以检查，通常不会遇到 OutOfMemoryError，但只会进入主要 GC 的循环，使 CPU 达到峰值，从而阻止后台线程执行，如果它甚至可以完成，只要它有足够的内存就需要 10 到 50 倍的时间。

我已经实现了可以将数据刷新到磁盘并将磁盘流视为内存中迭代器的系统。它通常比内存慢一个数量级，但足以满足这 5% 的情况。我目前正在触发一个最大集合上下文的启发式方法，该集合上下文跟踪数据处理中涉及的各种集合的大小。这行得通，但实际上只是一个临时的经验阈值。

我宁愿对 JVM 接近上述不良状态做出反应并在那时刷新到磁盘。我试过观察记忆，但找不到伊甸园、旧世界等的正确组合来可靠地预测死亡螺旋。我也试过只观察主要 GC 的频率，但这似乎也受到 "too conservative" 到 "too late" 范围广泛的影响。

任何用于判断 JVM 运行状况和检测故障状态的资源都将不胜感激。

Answer 1

可能这个link会对你有帮助http://www.javaspecialists.eu/archive/Issue092.html

In my MemoryWarningSystem you add listeners that implement the MemoryWarningSystem.Listener interface, with one method memoryUsageLow(long usedMemory, long maxMemory) that will be called when the threshold is reached. In my experiments, the memory bean notifies us quite soon after the usage threshold has been exceeded, but I could not determine the granularity. Something to note is that the listener is being called by a special thread, called the Low Memory Detector thread, that is now part of the standard JVM.

What is the threshold? And which of the many pools should we monitor? The only sensible pool to monitor is the Tenured Generation (Old Space). When you set the size of the memory with -Xmx256m, you are setting the maximum memory to be used in the Tenured Generation.

Answer 2

除了@Alla 的 link. You can use a combination of weak references and reference queues. This 旧但有效的文章中描述的 MemoryMXBean 通知机制之外，还对弱引用、软引用和幻象引用以及引用队列进行了很好的描述。

基本思想是创建一个大数组（以保留内存）创建一个弱引用或软引用，然后将其添加到 reference queue。当内存压力触发弱引用数组的收集时，您将获得保留内存（希望为您的应用程序注入活力并给它时间）。让一个线程轮询参考队列以确定何时收集了您的储备。然后，您可以触发应用程序的文件流行为来完成这项工作。 SoftReferences 比 WeakReferences 对内存压力更有弹性，可以更好地服务于您的目的。

Answer 3

一种可靠的方法是在 GC 事件上注册一个通知侦听器，并在所有 Full GC 事件之后检查内存健康状况。在完整的 GC 事件之后，使用的内存是您实际的实时数据集。如果您在那个时间点的可用内存不足，则可能是时候开始刷新到磁盘了。

这样您就可以避免在不知道何时发生完整 GC 的情况下尝试检查内存时经常发生的误报，例如在使用 MEMORY_THRESHOLD_EXCEEDED 通知类型时。

您可以注册一个通知侦听器并使用类似以下代码的方式处理 Full GC 事件：

// ... standard imports ommitted
import com.sun.management.GarbageCollectionNotificationInfo;

public static void installGCMonitoring() {
    List<GarbageCollectorMXBean> gcBeans = ManagementFactory.getGarbageCollectorMXBeans();
    for (GarbageCollectorMXBean gcBean : gcBeans) {
        NotificationEmitter emitter = (NotificationEmitter) gcBean;
        NotificationListener listener = notificationListener();
        emitter.addNotificationListener(listener, null, null);
    }
}

private static NotificationListener notificationListener() {
    return new NotificationListener() {
        @Override
        public void handleNotification(Notification notification, Object handback) {
            if (notification.getType()
                    .equals(GarbageCollectionNotificationInfo.GARBAGE_COLLECTION_NOTIFICATION)) {
                GarbageCollectionNotificationInfo info = GarbageCollectionNotificationInfo
                        .from((CompositeData) notification.getUserData());
                String gctype = info.getGcAction();
                if (gctype.contains("major")) {
                    // We are only interested in full (major) GCs
                    Map<String, MemoryUsage> mem = info.getGcInfo().getMemoryUsageAfterGc();
                    for (Entry<String, MemoryUsage> entry : mem.entrySet()) {
                        String memoryPoolName = entry.getKey();
                        MemoryUsage memdetail = entry.getValue();
                        long memMax = memdetail.getMax();
                        long memUsed = memdetail.getUsed();
                        // Use the memMax/memUsed of the pool you are interested in (probably old gen)
                        // to determine memory health.
                    }
                }
            }
        }
    };
}

Cred to this 文章，我们首先从中得到了这个想法。

一个有用的指标，用于确定 JVM 何时会陷入 memory/GC 麻烦

A useful metric for determining when the JVM is about to get into memory/GC trouble

java

garbage-collection

jvm

scala