OutOfMemoryException 通过 JPA 加载数据:需要帮助分析
OutOfMemoryException loading data via JPA: Need help analyzing
我编写了一个应用程序(Springboot + Data JPA + Data Rest),它在应用程序加载时不断向我抛出 OutOfMemoryException。我可以跳过在应用程序启动时运行的代码,但随后可能会发生异常。最好向您展示应用程序启动时发生的情况,因为它实际上非常简单,不会造成任何问题恕我直言:
@SpringBootApplication
@EnableAsync
@EnableJpaAuditing
public class ScraperApplication {
public static void main(String[] args) {
SpringApplication.run(ScraperApplication.class, args);
}
}
@Component
@RequiredArgsConstructor(onConstructor = @__(@Autowired))
public class DefaultDataLoader {
private final @NonNull LuceneService luceneService;
@Transactional
@EventListener(ApplicationReadyEvent.class)
public void load() {
luceneService.reindexData();
}
}
@Service
@RequiredArgsConstructor(onConstructor = @__(@Autowired))
public class LuceneService {
private static final Log LOG = LogFactory.getLog(LuceneService.class);
private final @NonNull TrainingRepo trainingRepo;
private final @NonNull EntityManager entityManager;
public void reindexData() {
LOG.info("Reindexing triggered");
FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(entityManager);
fullTextEntityManager.purgeAll(Training.class);
LOG.info("Index purged");
int page = 0;
int size = 100;
boolean morePages = true;
Page<Training> pageData;
while (morePages) {
pageData = trainingRepo.findAll(PageRequest.of(page, size));
LOG.info("Loading page " + (page + 1) + "/" + pageData.getTotalPages());
pageData.getContent().stream().forEach(t -> fullTextEntityManager.index(t));
fullTextEntityManager.flushToIndexes(); // flush regularly to keep memory footprint low
morePages = pageData.getTotalPages() > ++page;
}
fullTextEntityManager.flushToIndexes();
LOG.info("Index flushed");
}
}
你可以看到我正在做的是清除索引,以分页方式(一次 100 个)从 TrainingRepo 读取所有培训并将它们写入索引。实际上并没有发生什么。在收到 "Index purged" 消息几分钟后,我收到了这个 - 只有这个:
java.lang.OutOfMemoryError: Java heap space
在日志中我看到了 "Index purged" 但从未看到任何 "Loading page ..." 消息,因此它必须卡在 findAll() 调用上。
我让 JVM 编写堆转储并将其加载到 Eclipse 内存分析器中,并获得了完整的堆栈跟踪:https://gist.github.com/mathias-ewald/2fddb9762427374bb04d332bd0b6b499
我也浏览了一下报告,但我需要帮助来解释这些信息,这就是为什么我附上了一些 Eclipse 内存分析器的屏幕截图。
编辑:
我刚刚启用 "show-sql" 并在一切挂起之前看到了这个:
Hibernate: select training0_.id as id1_9_, training0_.created_date as created_2_9_, training0_.description as descript3_9_, training0_.duration_days as duration4_9_, training0_.execution_id as executi14_9_, training0_.level as level5_9_, training0_.modified_date as modified6_9_, training0_.name as name7_9_, training0_.price as price8_9_, training0_.product as product9_9_, training0_.quality as quality10_9_, training0_.raw as raw11_9_, training0_.url as url12_9_, training0_.vendor as vendor13_9_ from training training0_ where not (exists (select 1 from training training1_ where training0_.url=training1_.url and training0_.created_date<training1_.created_date)) limit ?
Hibernate: select execution0_.id as id1_1_0_, execution0_.created_date as created_2_1_0_, execution0_.duration_millis as duration3_1_0_, execution0_.message as message4_1_0_, execution0_.modified_date as modified5_1_0_, execution0_.scraper as scraper6_1_0_, execution0_.stats_id as stats_id8_1_0_, execution0_.status as status7_1_0_, properties1_.execution_id as executio1_2_1_, properties1_.properties as properti2_2_1_, properties1_.properties_key as properti3_1_, stats2_.id as id1_5_2_, stats2_.avg_quality as avg_qual2_5_2_, stats2_.max_quality as max_qual3_5_2_, stats2_.min_quality as min_qual4_5_2_, stats2_.null_products as null_pro5_5_2_, stats2_.null_vendors as null_ven6_5_2_, stats2_.products as products7_5_2_, stats2_.tags as tags8_5_2_, stats2_.trainings as training9_5_2_, stats2_.vendors as vendors10_5_2_, producthis3_.stats_id as stats_id1_6_3_, producthis3_.product_histogram as product_2_6_3_, producthis3_.product_histogram_key as product_3_3_, taghistogr4_.stats_id as stats_id1_7_4_, taghistogr4_.tag_histogram as tag_hist2_7_4_, taghistogr4_.tag_histogram_key as tag_hist3_4_, vendorhist5_.stats_id as stats_id1_8_5_, vendorhist5_.vendor_histogram as vendor_h2_8_5_, vendorhist5_.vendor_histogram_key as vendor_h3_5_ from execution execution0_ left outer join execution_properties properties1_ on execution0_.id=properties1_.execution_id left outer join stats stats2_ on execution0_.stats_id=stats2_.id left outer join stats_product_histogram producthis3_ on stats2_.id=producthis3_.stats_id left outer join stats_tag_histogram taghistogr4_ on stats2_.id=taghistogr4_.stats_id left outer join stats_vendor_histogram vendorhist5_ on stats2_.id=vendorhist5_.stats_id where execution0_.id=?
显然,它创建了获取所有训练实体的语句,但执行语句是它设法执行的最后一个语句。
我将训练与执行的关系从 @ManyToOne
更改为 @ManyToOne(fetch = FetchType.LAZY)
,突然间我的代码能够再次将数据加载到索引中。所以我认为我的执行实体映射可能有问题。让我与您分享代码:
@Entity
@Data
@EntityListeners(AuditingEntityListener.class)
public class Execution {
public enum Status { SCHEDULED, RUNNING, SUCCESS, FAILURE };
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
@ToString.Include
private Long id;
@Column(updatable = false)
private String scraper;
@CreatedDate
private LocalDateTime createdDate;
@LastModifiedDate
private LocalDateTime modifiedDate;
@Min(0)
@JsonProperty(access = Access.READ_ONLY)
private Long durationMillis;
@ElementCollection(fetch = FetchType.EAGER)
private Map<String, String> properties;
@NotNull
@Enumerated(EnumType.STRING)
private Status status;
@Column(length = 9999999)
private String message;
@EqualsAndHashCode.Exclude
@OneToOne(cascade = CascadeType.ALL)
private Stats stats;
}
因为它是执行的关系,所以这里也是 Stats 实体:
@Entity
@Data
public class Stats {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
@ToString.Include
private Long id;
private Long trainings;
private Long vendors;
private Long products;
private Long tags;
private Long nullVendors;
private Long nullProducts;
private Double minQuality;
private Double avgQuality;
private Double maxQuality;
@ElementCollection(fetch = FetchType.EAGER)
private Map<String, Long> vendorHistogram;
@ElementCollection(fetch = FetchType.EAGER)
private Map<String, Long> productHistogram;
@ElementCollection(fetch = FetchType.EAGER)
private Map<String, Long> tagHistogram;
}
我认为这与您的 FullTextEntityManager 没有找到足够的内存有关。您必须通过此线程配置您的 queryPlanCache.Go 如何 and this one too.
所有这些都是 运行 在单个事务中,我在这里看不到 clear
,所以 EntityManager
加载所有这些数据仍然引用它。
要修复此问题,请注入 EntityManager
and invoke clear
。或者将事务的范围设为一页的处理。
为此我推荐 TransactionTemplate
。
我不熟悉 FullTextEntityManager
但它可能有类似的问题。
有关更多背景信息,您可能需要阅读 JPA 实体生命周期。
我编写了一个应用程序(Springboot + Data JPA + Data Rest),它在应用程序加载时不断向我抛出 OutOfMemoryException。我可以跳过在应用程序启动时运行的代码,但随后可能会发生异常。最好向您展示应用程序启动时发生的情况,因为它实际上非常简单,不会造成任何问题恕我直言:
@SpringBootApplication
@EnableAsync
@EnableJpaAuditing
public class ScraperApplication {
public static void main(String[] args) {
SpringApplication.run(ScraperApplication.class, args);
}
}
@Component
@RequiredArgsConstructor(onConstructor = @__(@Autowired))
public class DefaultDataLoader {
private final @NonNull LuceneService luceneService;
@Transactional
@EventListener(ApplicationReadyEvent.class)
public void load() {
luceneService.reindexData();
}
}
@Service
@RequiredArgsConstructor(onConstructor = @__(@Autowired))
public class LuceneService {
private static final Log LOG = LogFactory.getLog(LuceneService.class);
private final @NonNull TrainingRepo trainingRepo;
private final @NonNull EntityManager entityManager;
public void reindexData() {
LOG.info("Reindexing triggered");
FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(entityManager);
fullTextEntityManager.purgeAll(Training.class);
LOG.info("Index purged");
int page = 0;
int size = 100;
boolean morePages = true;
Page<Training> pageData;
while (morePages) {
pageData = trainingRepo.findAll(PageRequest.of(page, size));
LOG.info("Loading page " + (page + 1) + "/" + pageData.getTotalPages());
pageData.getContent().stream().forEach(t -> fullTextEntityManager.index(t));
fullTextEntityManager.flushToIndexes(); // flush regularly to keep memory footprint low
morePages = pageData.getTotalPages() > ++page;
}
fullTextEntityManager.flushToIndexes();
LOG.info("Index flushed");
}
}
你可以看到我正在做的是清除索引,以分页方式(一次 100 个)从 TrainingRepo 读取所有培训并将它们写入索引。实际上并没有发生什么。在收到 "Index purged" 消息几分钟后,我收到了这个 - 只有这个:
java.lang.OutOfMemoryError: Java heap space
在日志中我看到了 "Index purged" 但从未看到任何 "Loading page ..." 消息,因此它必须卡在 findAll() 调用上。
我让 JVM 编写堆转储并将其加载到 Eclipse 内存分析器中,并获得了完整的堆栈跟踪:https://gist.github.com/mathias-ewald/2fddb9762427374bb04d332bd0b6b499
我也浏览了一下报告,但我需要帮助来解释这些信息,这就是为什么我附上了一些 Eclipse 内存分析器的屏幕截图。
编辑:
我刚刚启用 "show-sql" 并在一切挂起之前看到了这个:
Hibernate: select training0_.id as id1_9_, training0_.created_date as created_2_9_, training0_.description as descript3_9_, training0_.duration_days as duration4_9_, training0_.execution_id as executi14_9_, training0_.level as level5_9_, training0_.modified_date as modified6_9_, training0_.name as name7_9_, training0_.price as price8_9_, training0_.product as product9_9_, training0_.quality as quality10_9_, training0_.raw as raw11_9_, training0_.url as url12_9_, training0_.vendor as vendor13_9_ from training training0_ where not (exists (select 1 from training training1_ where training0_.url=training1_.url and training0_.created_date<training1_.created_date)) limit ?
Hibernate: select execution0_.id as id1_1_0_, execution0_.created_date as created_2_1_0_, execution0_.duration_millis as duration3_1_0_, execution0_.message as message4_1_0_, execution0_.modified_date as modified5_1_0_, execution0_.scraper as scraper6_1_0_, execution0_.stats_id as stats_id8_1_0_, execution0_.status as status7_1_0_, properties1_.execution_id as executio1_2_1_, properties1_.properties as properti2_2_1_, properties1_.properties_key as properti3_1_, stats2_.id as id1_5_2_, stats2_.avg_quality as avg_qual2_5_2_, stats2_.max_quality as max_qual3_5_2_, stats2_.min_quality as min_qual4_5_2_, stats2_.null_products as null_pro5_5_2_, stats2_.null_vendors as null_ven6_5_2_, stats2_.products as products7_5_2_, stats2_.tags as tags8_5_2_, stats2_.trainings as training9_5_2_, stats2_.vendors as vendors10_5_2_, producthis3_.stats_id as stats_id1_6_3_, producthis3_.product_histogram as product_2_6_3_, producthis3_.product_histogram_key as product_3_3_, taghistogr4_.stats_id as stats_id1_7_4_, taghistogr4_.tag_histogram as tag_hist2_7_4_, taghistogr4_.tag_histogram_key as tag_hist3_4_, vendorhist5_.stats_id as stats_id1_8_5_, vendorhist5_.vendor_histogram as vendor_h2_8_5_, vendorhist5_.vendor_histogram_key as vendor_h3_5_ from execution execution0_ left outer join execution_properties properties1_ on execution0_.id=properties1_.execution_id left outer join stats stats2_ on execution0_.stats_id=stats2_.id left outer join stats_product_histogram producthis3_ on stats2_.id=producthis3_.stats_id left outer join stats_tag_histogram taghistogr4_ on stats2_.id=taghistogr4_.stats_id left outer join stats_vendor_histogram vendorhist5_ on stats2_.id=vendorhist5_.stats_id where execution0_.id=?
显然,它创建了获取所有训练实体的语句,但执行语句是它设法执行的最后一个语句。
我将训练与执行的关系从 @ManyToOne
更改为 @ManyToOne(fetch = FetchType.LAZY)
,突然间我的代码能够再次将数据加载到索引中。所以我认为我的执行实体映射可能有问题。让我与您分享代码:
@Entity
@Data
@EntityListeners(AuditingEntityListener.class)
public class Execution {
public enum Status { SCHEDULED, RUNNING, SUCCESS, FAILURE };
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
@ToString.Include
private Long id;
@Column(updatable = false)
private String scraper;
@CreatedDate
private LocalDateTime createdDate;
@LastModifiedDate
private LocalDateTime modifiedDate;
@Min(0)
@JsonProperty(access = Access.READ_ONLY)
private Long durationMillis;
@ElementCollection(fetch = FetchType.EAGER)
private Map<String, String> properties;
@NotNull
@Enumerated(EnumType.STRING)
private Status status;
@Column(length = 9999999)
private String message;
@EqualsAndHashCode.Exclude
@OneToOne(cascade = CascadeType.ALL)
private Stats stats;
}
因为它是执行的关系,所以这里也是 Stats 实体:
@Entity
@Data
public class Stats {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
@ToString.Include
private Long id;
private Long trainings;
private Long vendors;
private Long products;
private Long tags;
private Long nullVendors;
private Long nullProducts;
private Double minQuality;
private Double avgQuality;
private Double maxQuality;
@ElementCollection(fetch = FetchType.EAGER)
private Map<String, Long> vendorHistogram;
@ElementCollection(fetch = FetchType.EAGER)
private Map<String, Long> productHistogram;
@ElementCollection(fetch = FetchType.EAGER)
private Map<String, Long> tagHistogram;
}
我认为这与您的 FullTextEntityManager 没有找到足够的内存有关。您必须通过此线程配置您的 queryPlanCache.Go 如何
所有这些都是 运行 在单个事务中,我在这里看不到 clear
,所以 EntityManager
加载所有这些数据仍然引用它。
要修复此问题,请注入 EntityManager
and invoke clear
。或者将事务的范围设为一页的处理。
为此我推荐 TransactionTemplate
。
我不熟悉 FullTextEntityManager
但它可能有类似的问题。
有关更多背景信息,您可能需要阅读 JPA 实体生命周期。