OutOfMemoryException 通过 JPA 加载数据:需要帮助分析

OutOfMemoryException loading data via JPA: Need help analyzing

我编写了一个应用程序(Springboot + Data JPA + Data Rest),它在应用程序加载时不断向我抛出 OutOfMemoryException。我可以跳过在应用程序启动时运行的代码,但随后可能会发生异常。最好向您展示应用程序启动时发生的情况,因为它实际上非常简单,不会造成任何问题恕我直言:

@SpringBootApplication
@EnableAsync
@EnableJpaAuditing
public class ScraperApplication {
    public static void main(String[] args) {
        SpringApplication.run(ScraperApplication.class, args);
    }
}

@Component
@RequiredArgsConstructor(onConstructor = @__(@Autowired))
public class DefaultDataLoader {
    private final @NonNull LuceneService luceneService;

    @Transactional
    @EventListener(ApplicationReadyEvent.class)
    public void load() {
        luceneService.reindexData();
    }
}


@Service
@RequiredArgsConstructor(onConstructor = @__(@Autowired))
public class LuceneService {

    private static final Log LOG = LogFactory.getLog(LuceneService.class);

    private final @NonNull TrainingRepo trainingRepo;

    private final @NonNull EntityManager entityManager;

    public void reindexData() {
        LOG.info("Reindexing triggered");

        FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(entityManager);
        fullTextEntityManager.purgeAll(Training.class);

        LOG.info("Index purged");

        int page = 0;
        int size = 100;
        boolean morePages = true;
        Page<Training> pageData;

        while (morePages) {
            pageData = trainingRepo.findAll(PageRequest.of(page, size));
            LOG.info("Loading page " + (page + 1) + "/" + pageData.getTotalPages());
            pageData.getContent().stream().forEach(t -> fullTextEntityManager.index(t));
            fullTextEntityManager.flushToIndexes(); // flush regularly to keep memory footprint low
            morePages = pageData.getTotalPages() > ++page;
        }

        fullTextEntityManager.flushToIndexes();
        LOG.info("Index flushed");
    }

}

你可以看到我正在做的是清除索引,以分页方式(一次 100 个)从 TrainingRepo 读取所有培训并将它们写入索引。实际上并没有发生什么。在收到 "Index purged" 消息几分钟后,我收到了这个 - 只有这个:

java.lang.OutOfMemoryError: Java heap space

在日志中我看到了 "Index purged" 但从未看到任何 "Loading page ..." 消息,因此它必须卡在 findAll() 调用上。

我让 JVM 编写堆转储并将其加载到 Eclipse 内存分析器中,并获得了完整的堆栈跟踪:https://gist.github.com/mathias-ewald/2fddb9762427374bb04d332bd0b6b499

我也浏览了一下报告,但我需要帮助来解释这些信息,这就是为什么我附上了一些 Eclipse 内存分析器的屏幕截图。

编辑:

我刚刚启用 "show-sql" 并在一切挂起之前看到了这个:

Hibernate: select training0_.id as id1_9_, training0_.created_date as created_2_9_, training0_.description as descript3_9_, training0_.duration_days as duration4_9_, training0_.execution_id as executi14_9_, training0_.level as level5_9_, training0_.modified_date as modified6_9_, training0_.name as name7_9_, training0_.price as price8_9_, training0_.product as product9_9_, training0_.quality as quality10_9_, training0_.raw as raw11_9_, training0_.url as url12_9_, training0_.vendor as vendor13_9_ from training training0_ where  not (exists (select 1 from training training1_ where training0_.url=training1_.url and training0_.created_date<training1_.created_date)) limit ?
Hibernate: select execution0_.id as id1_1_0_, execution0_.created_date as created_2_1_0_, execution0_.duration_millis as duration3_1_0_, execution0_.message as message4_1_0_, execution0_.modified_date as modified5_1_0_, execution0_.scraper as scraper6_1_0_, execution0_.stats_id as stats_id8_1_0_, execution0_.status as status7_1_0_, properties1_.execution_id as executio1_2_1_, properties1_.properties as properti2_2_1_, properties1_.properties_key as properti3_1_, stats2_.id as id1_5_2_, stats2_.avg_quality as avg_qual2_5_2_, stats2_.max_quality as max_qual3_5_2_, stats2_.min_quality as min_qual4_5_2_, stats2_.null_products as null_pro5_5_2_, stats2_.null_vendors as null_ven6_5_2_, stats2_.products as products7_5_2_, stats2_.tags as tags8_5_2_, stats2_.trainings as training9_5_2_, stats2_.vendors as vendors10_5_2_, producthis3_.stats_id as stats_id1_6_3_, producthis3_.product_histogram as product_2_6_3_, producthis3_.product_histogram_key as product_3_3_, taghistogr4_.stats_id as stats_id1_7_4_, taghistogr4_.tag_histogram as tag_hist2_7_4_, taghistogr4_.tag_histogram_key as tag_hist3_4_, vendorhist5_.stats_id as stats_id1_8_5_, vendorhist5_.vendor_histogram as vendor_h2_8_5_, vendorhist5_.vendor_histogram_key as vendor_h3_5_ from execution execution0_ left outer join execution_properties properties1_ on execution0_.id=properties1_.execution_id left outer join stats stats2_ on execution0_.stats_id=stats2_.id left outer join stats_product_histogram producthis3_ on stats2_.id=producthis3_.stats_id left outer join stats_tag_histogram taghistogr4_ on stats2_.id=taghistogr4_.stats_id left outer join stats_vendor_histogram vendorhist5_ on stats2_.id=vendorhist5_.stats_id where execution0_.id=?

显然,它创建了获取所有训练实体的语句,但执行语句是它设法执行的最后一个语句。

我将训练与执行的关系从 @ManyToOne 更改为 @ManyToOne(fetch = FetchType.LAZY),突然间我的代码能够再次将数据加载到索引中。所以我认为我的执行实体映射可能有问题。让我与您分享代码:

@Entity
@Data
@EntityListeners(AuditingEntityListener.class)
public class Execution {

    public enum Status { SCHEDULED, RUNNING, SUCCESS, FAILURE };

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    @ToString.Include
    private Long id;

    @Column(updatable = false)
    private String scraper;

    @CreatedDate
    private LocalDateTime createdDate;

    @LastModifiedDate
    private LocalDateTime modifiedDate;

    @Min(0)
    @JsonProperty(access = Access.READ_ONLY)
    private Long durationMillis;

    @ElementCollection(fetch = FetchType.EAGER)
    private Map<String, String> properties;

    @NotNull
    @Enumerated(EnumType.STRING)
    private Status status;

    @Column(length = 9999999)
    private String message;

    @EqualsAndHashCode.Exclude
    @OneToOne(cascade = CascadeType.ALL)
    private Stats stats;

}

因为它是执行的关系,所以这里也是 Stats 实体:

@Entity
@Data
public class Stats {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    @ToString.Include
    private Long id;

    private Long trainings;

    private Long vendors;

    private Long products;

    private Long tags;

    private Long nullVendors;

    private Long nullProducts;

    private Double minQuality;

    private Double avgQuality;

    private Double maxQuality;

    @ElementCollection(fetch = FetchType.EAGER)
    private Map<String, Long> vendorHistogram;

    @ElementCollection(fetch = FetchType.EAGER)
    private Map<String, Long> productHistogram;

    @ElementCollection(fetch = FetchType.EAGER)
    private Map<String, Long> tagHistogram;

}

我认为这与您的 FullTextEntityManager 没有找到足够的内存有关。您必须通过此线程配置您的 queryPlanCache.Go 如何 and this one too.

所有这些都是 运行 在单个事务中,我在这里看不到 clear,所以 EntityManager 加载所有这些数据仍然引用它。

要修复此问题,请注入 EntityManager and invoke clear。或者将事务的范围设为一页的处理。

为此我推荐 TransactionTemplate

我不熟悉 FullTextEntityManager 但它可能有类似的问题。

有关更多背景信息,您可能需要阅读 JPA 实体生命周期。