如何提高向数据库中插入数据的性能?
How to increase the performance of inserting data into the database?
我使用 PostgreSQL 9.5(和最新的 JDBC 驱动程序 - 9.4.1209)、JPA 2.1 (Hibernate)、EJB 3.2、CDI、JSF 2.2 和 Wildfly 10。我必须插入很多数据进入数据库(约 100 万 - 1.7 亿个实体)。实体的数量取决于用户将添加到页面表单的文件。
有什么问题?
问题是将数据插入数据库的执行时间非常慢。每次调用 flush()
方法时执行时间都在增加。我已经把 println(...)
方法放在一起,以了解 flush
方法的执行速度。对于前 4 次(400000 个实体),我每 20 秒收到一次 println(...)
方法的结果。后来,flush
方法的执行时间非常慢,而且还在增长。
当然,如果我删除 flush()
和 clear()
方法,我每隔 1 秒就会收到 println(...)
方法的结果,但是当我接近 300 万个实体时,我也收到异常:
java.lang.OutOfMemoryError: GC overhead limit exceeded
到目前为止我做了什么?
- 我也尝试过容器管理的事务和 Bean 管理的事务(查看下面的代码)。
- 我不使用 PK ID 的
auto_increment
功能。我在 bean 代码中手动添加 ID。
- 我还尝试更改要刷新的实体数量(目前为 100000)。
- 我尝试设置与
hibernate.jdbc.batch_size
属性 中相同数量的实体。它没有帮助,执行时间慢得多。
- 我尝试使用
persistence.xml
文件中的属性进行试验。例如,我添加了 reWriteBatchedInserts
属性 但实际上我不知道它是否有帮助。
- PostgreSQL 是运行 在SSD 上,但数据存储在HDD 上,因为将来数据可能会很大。但是我尝试将我的 PostgreSQL 数据移动到 SSD,结果是一样的,没有任何改变。
问题是:如何提高向数据库中插入数据的性能?
这是我的 table 的结构:
column_name | udt_name | length | is_nullable | key
---------------+-------------+--------+-------------+--------
id | int8 | | NO | PK
id_user_table | int4 | | NO | FK
starttime | timestamptz | | NO |
time | float8 | | NO |
sip | varchar | 100 | NO |
dip | varchar | 100 | NO |
sport | int4 | | YES |
dport | int4 | | YES |
proto | varchar | 50 | NO |
totbytes | int8 | | YES |
info | text | | YES |
label | varchar | 10 | NO |
这是我将数据插入数据库的 EJB bean(第一个版本)的一部分:
@Stateless
public class DataDaoImpl extends GenericDaoImpl<Data> implements DataDao {
/**
* This's the first method which is executed.
* The CDI bean (controller) calls this method.
* @param list - data from the file.
* @param idFK - foreign key.
*/
public void send(List<String> list, int idFK) {
if(handleCSV(list,idFK)){
//...
}
else{
//...
}
}
/**
* The method inserts data into the database.
*/
@TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
private boolean handleCSV(List<String> list, int idFK){
try{
long start=0;
Pattern patternRow=Pattern.compile(",");
for (String s : list) {
if(start!=0){
String[] data=patternRow.split(s);
//Preparing data...
DataStoreAll dataStore=new DataStoreAll();
DataStoreAllId dataId=new DataStoreAllId(start++, idFK);
dataStore.setId(dataId);
//Setting the other object fields...
entityManager.persist(dataStore);
if(start%100000==0){
System.out.println("Number of entities: "+start);
entityManager.flush();
entityManager.clear();
}
}
else start++;
}
} catch(Throwable t){
CustomExceptionHandler exception=new CustomExceptionHandler(t);
return exception.persist("DDI", "handleCSV");
}
return true;
}
@Inject
private EntityManager entityManager;
}
我没有使用容器管理的事务,而是尝试使用 Bean 管理的事务(第二个版本):
@Stateless
@TransactionManagement(TransactionManagementType.BEAN)
public class DataDaoImpl extends GenericDaoImpl<Data> {
/**
* This's the first method which is executed.
* The CDI bean (controller) calls this method.
* @param list - data from the file.
* @param idFK - foreign key.
*/
public void send(List<String> list, int idFK) {
if(handleCSV(list,idFK)){
//...
}
else{
//...
}
}
/**
* The method inserts data into the linkedList collection.
*/
private boolean handleCSV(List<String> list, int idFK){
try{
long start=0;
Pattern patternRow=Pattern.compile(",");
List<DataStoreAll> entitiesAll=new LinkedList<>();
for (String s : list) {
if(start!=0){
String[] data=patternRow.split(s);
//Preparing data...
DataStoreAll dataStore=new DataStoreAll();
DataStoreAllId dataId=new DataStoreAllId(start++, idFK);
dataStore.setId(dataId);
//Setting the other object fields...
entitiesAll.add(dataStore);
if(start%100000==0){
System.out.println("Number of entities: "+start);
saveDataStoreAll(entitiesAll);
}
}
else start++;
}
} catch(Throwable t){
CustomExceptionHandler exception=new CustomExceptionHandler(t);
return exception.persist("DDI", "handleCSV");
}
return true;
}
/**
* The method commits the transaction.
*/
private void saveDataStoreAll(List<DataStoreAll> entities) throws EntityExistsException,IllegalArgumentException,TransactionRequiredException,PersistenceException,Throwable {
Iterator<DataStoreAll> iter=entities.iterator();
ut.begin();
while(iter.hasNext()){
entityManager.persist(iter.next());
iter.remove();
entityManager.flush();
entityManager.clear();
}
ut.commit();
}
@Inject
private EntityManager entityManager;
@Inject
private UserTransaction ut;
}
这是我的 persistence.xml
:
<?xml version="1.0" encoding="UTF-8"?>
<persistence version="2.1"
xmlns="http://xmlns.jcp.org/xml/ns/persistence" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://xmlns.jcp.org/xml/ns/persistence
http://xmlns.jcp.org/xml/ns/persistence/persistence_2_1.xsd">
<persistence-unit name="primary">
<jta-data-source>java:/PostgresDS</jta-data-source>
<properties>
<property name="hibernate.show_sql" value="false" />
<property name="hibernate.jdbc.batch_size" value="50" />
<property name="hibernate.order_inserts" value="true" />
<property name="hibernate.order_updates" value="true" />
<property name="hibernate.jdbc.batch_versioned_data" value="true"/>
<property name="reWriteBatchedInserts" value="true"/>
</properties>
</persistence-unit>
</persistence>
如果我忘了添加什么,请告诉我,我会更新我的 post。
更新
这是调用 DataDaoImpl#send(...)
:
的控制器
@Named
@ViewScoped
public class DataController implements Serializable {
@PostConstruct
private void init(){
//...
}
/**
* Handle of the uploaded file.
*/
public void handleFileUpload(FileUploadEvent event){
uploadFile=event.getFile();
try(InputStream input = uploadFile.getInputstream()){
Path folder=Paths.get(System.getProperty("jboss.server.data.dir"),"upload");
if(!folder.toFile().exists()){
if(!folder.toFile().mkdirs()){
folder=Paths.get(System.getProperty("jboss.server.data.dir"));
}
}
String filename = FilenameUtils.getBaseName(uploadFile.getFileName());
String extension = FilenameUtils.getExtension(uploadFile.getFileName());
filePath = Files.createTempFile(folder, filename + "-", "." + extension);
//Save the file on the server.
Files.copy(input, filePath, StandardCopyOption.REPLACE_EXISTING);
//Add reference to the unconfirmed uploaded files list.
userFileManager.addUnconfirmedUploadedFile(filePath.toFile());
FacesContext.getCurrentInstance().addMessage(null, new FacesMessage(FacesMessage.SEVERITY_INFO, "Success", uploadFile.getFileName() + " was uploaded."));
} catch (IOException e) {
//...
}
}
/**
* Sending data from file to the database.
*/
public void send(){
//int idFK=...
//The model includes the data from the file and other things which I transfer to the EJB bean.
AddDataModel addDataModel=new AddDataModel();
//Setting the addDataModel fields...
try{
if(uploadFile!=null){
//Each row of the file == 1 entity.
List<String> list=new ArrayList<String>();
Stream<String> stream=Files.lines(filePath);
list=stream.collect(Collectors.toList());
addDataModel.setList(list);
}
} catch (IOException e) {
//...
}
//Sending data to the DataDaoImpl EJB bean.
if(dataDao.send(addDataModel,idFK)){
userFileManager.confirmUploadedFile(filePath.toFile());
FacesContext.getCurrentInstance().addMessage(null, new FacesMessage(FacesMessage.SEVERITY_INFO, "The data was saved in the database.", ""));
}
}
private static final long serialVersionUID = -7202741739427929050L;
@Inject
private DataDao dataDao;
private UserFileManager userFileManager;
private UploadedFile uploadFile;
private Path filePath;
}
更新 2
这是我将数据插入数据库的更新后的 EJB bean:
@Stateless
@TransactionManagement(TransactionManagementType.BEAN)
public class DataDaoImpl extends GenericDaoImpl<Data> {
/**
* This's the first method which is executed.
* The CDI bean (controller) calls this method.
* @param addDataModel - object which includes path to the uploaded file and other things which are needed.
*/
public void send(AddDataModel addDataModel){
if(handleCSV(addDataModel)){
//...
}
else{
//...
}
}
/**
* The method inserts data into the database.
*/
private boolean handleCSV(AddDataModel addDataModel){
PreparedStatement ps=null;
Connection con=null;
FileInputStream fileInputStream=null;
Scanner scanner=null;
try{
con=ds.getConnection();
con.setAutoCommit(false);
ps=con.prepareStatement("insert into data_store_all "
+ "(id,id_user_table,startTime,time,sIP,dIP,sPort,dPort,proto,totBytes,info) "
+ "values(?,?,?,?,?,?,?,?,?,?,?)");
long start=0;
fileInputStream=new FileInputStream(addDataModel.getPath().toFile());
scanner=new Scanner(fileInputStream, "UTF-8");
Pattern patternRow=Pattern.compile(",");
Pattern patternPort=Pattern.compile("\d+");
while(scanner.hasNextLine()) {
if(start!=0){
//Loading a row from the file into table.
String[] data=patternRow.split(scanner.nextLine().replaceAll("[\"]",""));
//Preparing datetime.
SimpleDateFormat simpleDateFormat=new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
GregorianCalendar calendar=new GregorianCalendar();
calendar.setTime(simpleDateFormat.parse(data[1]));
calendar.set(Calendar.MILLISECOND, Integer.parseInt(Pattern.compile("\.").split(data[1])[1])/1000);
//Preparing an entity
ps.setLong(1, start++); //id PK
ps.setInt(2, addDataModel.getIdFk()); //id FK
ps.setTimestamp(3, new Timestamp(calendar.getTime().getTime())); //datetime
ps.setDouble(4, Double.parseDouble(data[2])); //time
ps.setString(5, data[3]); //sip
ps.setString(6, data[4]); //dip
if(!data[5].equals("") && patternPort.matcher(data[5]).matches()) ps.setInt(7, Integer.parseInt(data[5])); //sport
else ps.setNull(7, java.sql.Types.INTEGER);
if(!data[6].equals("") && patternPort.matcher(data[6]).matches()) ps.setInt(8, Integer.parseInt(data[6])); //dport
else ps.setNull(8, java.sql.Types.INTEGER);
ps.setString(9, data[7]); //proto
if(!data[8].trim().equals("")) ps.setLong(10, Long.parseLong(data[8])); //len
else ps.setObject(10, null);
if(data.length==10 && !data[9].trim().equals("")) ps.setString(11, data[9]); //info
else ps.setString(11, null);
ps.addBatch();
if(start%100000==0){
System.out.println("Number of entity: "+start);
ps.executeBatch();
ps.clearParameters();
ps.clearBatch();
con.commit();
}
}
else{
start++;
scanner.nextLine();
}
}
if (scanner.ioException() != null) throw scanner.ioException();
} catch(Throwable t){
CustomExceptionHandler exception=new CustomExceptionHandler(t);
return exception.persist("DDI", "handleCSV");
} finally{
if (fileInputStream!=null)
try {
fileInputStream.close();
} catch (Throwable t2) {
CustomExceptionHandler exception=new CustomExceptionHandler(t2);
return exception.persist("DDI", "handleCSV.Finally");
}
if (scanner != null) scanner.close();
}
return true;
}
@Inject
private EntityManager entityManager;
@Resource(mappedName="java:/PostgresDS")
private DataSource ds;
}
您的问题不一定是数据库或休眠,而是您一次将太多数据加载到内存中。这就是为什么您收到内存不足消息以及为什么您看到 jvm 在途中挣扎的原因。
您从流中读取文件,然后在创建字符串列表时将其全部推送到内存中。然后将该字符串列表映射到某种实体的链表中!
相反,使用流以小块的形式处理文件并将这些块插入到数据库中。基于扫描仪的方法看起来像这样:
FileInputStream inputStream = null;
Scanner sc = null;
try {
inputStream = new FileInputStream(path);
sc = new Scanner(inputStream, "UTF-8");
while (sc.hasNextLine()) {
String line = sc.nextLine();
// Talk to your database here!
}
// note that Scanner suppresses exceptions
if (sc.ioException() != null) {
throw sc.ioException();
}
} finally {
if (inputStream != null) {
inputStream.close();
}
if (sc != null) {
sc.close();
}
}
您可能会发现 hibernate/ejb 进行此更改后效果很好。但我认为您会发现普通的 jdbc 要快得多。他们说你可以期待 3 倍到 4 倍的减速,具体取决于。这将对大量数据产生重大影响。
如果您谈论的是真正大量的数据,那么您应该查看 CopyManager,它可以让您将流直接加载到数据库中。您可以使用流式 api 来转换数据。
当您使用 WildFly 10 时,您处于 Java EE 7 环境中。
因此,您应该考虑使用 JSR-352 批处理来执行文件导入。
看看An Overview of Batch Processing in Java EE 7.0。
这应该可以解决您所有的内存消耗和事务问题。
我使用 PostgreSQL 9.5(和最新的 JDBC 驱动程序 - 9.4.1209)、JPA 2.1 (Hibernate)、EJB 3.2、CDI、JSF 2.2 和 Wildfly 10。我必须插入很多数据进入数据库(约 100 万 - 1.7 亿个实体)。实体的数量取决于用户将添加到页面表单的文件。
有什么问题?
问题是将数据插入数据库的执行时间非常慢。每次调用 flush()
方法时执行时间都在增加。我已经把 println(...)
方法放在一起,以了解 flush
方法的执行速度。对于前 4 次(400000 个实体),我每 20 秒收到一次 println(...)
方法的结果。后来,flush
方法的执行时间非常慢,而且还在增长。
当然,如果我删除 flush()
和 clear()
方法,我每隔 1 秒就会收到 println(...)
方法的结果,但是当我接近 300 万个实体时,我也收到异常:
java.lang.OutOfMemoryError: GC overhead limit exceeded
到目前为止我做了什么?
- 我也尝试过容器管理的事务和 Bean 管理的事务(查看下面的代码)。
- 我不使用 PK ID 的
auto_increment
功能。我在 bean 代码中手动添加 ID。 - 我还尝试更改要刷新的实体数量(目前为 100000)。
- 我尝试设置与
hibernate.jdbc.batch_size
属性 中相同数量的实体。它没有帮助,执行时间慢得多。 - 我尝试使用
persistence.xml
文件中的属性进行试验。例如,我添加了reWriteBatchedInserts
属性 但实际上我不知道它是否有帮助。 - PostgreSQL 是运行 在SSD 上,但数据存储在HDD 上,因为将来数据可能会很大。但是我尝试将我的 PostgreSQL 数据移动到 SSD,结果是一样的,没有任何改变。
问题是:如何提高向数据库中插入数据的性能?
这是我的 table 的结构:
column_name | udt_name | length | is_nullable | key
---------------+-------------+--------+-------------+--------
id | int8 | | NO | PK
id_user_table | int4 | | NO | FK
starttime | timestamptz | | NO |
time | float8 | | NO |
sip | varchar | 100 | NO |
dip | varchar | 100 | NO |
sport | int4 | | YES |
dport | int4 | | YES |
proto | varchar | 50 | NO |
totbytes | int8 | | YES |
info | text | | YES |
label | varchar | 10 | NO |
这是我将数据插入数据库的 EJB bean(第一个版本)的一部分:
@Stateless
public class DataDaoImpl extends GenericDaoImpl<Data> implements DataDao {
/**
* This's the first method which is executed.
* The CDI bean (controller) calls this method.
* @param list - data from the file.
* @param idFK - foreign key.
*/
public void send(List<String> list, int idFK) {
if(handleCSV(list,idFK)){
//...
}
else{
//...
}
}
/**
* The method inserts data into the database.
*/
@TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
private boolean handleCSV(List<String> list, int idFK){
try{
long start=0;
Pattern patternRow=Pattern.compile(",");
for (String s : list) {
if(start!=0){
String[] data=patternRow.split(s);
//Preparing data...
DataStoreAll dataStore=new DataStoreAll();
DataStoreAllId dataId=new DataStoreAllId(start++, idFK);
dataStore.setId(dataId);
//Setting the other object fields...
entityManager.persist(dataStore);
if(start%100000==0){
System.out.println("Number of entities: "+start);
entityManager.flush();
entityManager.clear();
}
}
else start++;
}
} catch(Throwable t){
CustomExceptionHandler exception=new CustomExceptionHandler(t);
return exception.persist("DDI", "handleCSV");
}
return true;
}
@Inject
private EntityManager entityManager;
}
我没有使用容器管理的事务,而是尝试使用 Bean 管理的事务(第二个版本):
@Stateless
@TransactionManagement(TransactionManagementType.BEAN)
public class DataDaoImpl extends GenericDaoImpl<Data> {
/**
* This's the first method which is executed.
* The CDI bean (controller) calls this method.
* @param list - data from the file.
* @param idFK - foreign key.
*/
public void send(List<String> list, int idFK) {
if(handleCSV(list,idFK)){
//...
}
else{
//...
}
}
/**
* The method inserts data into the linkedList collection.
*/
private boolean handleCSV(List<String> list, int idFK){
try{
long start=0;
Pattern patternRow=Pattern.compile(",");
List<DataStoreAll> entitiesAll=new LinkedList<>();
for (String s : list) {
if(start!=0){
String[] data=patternRow.split(s);
//Preparing data...
DataStoreAll dataStore=new DataStoreAll();
DataStoreAllId dataId=new DataStoreAllId(start++, idFK);
dataStore.setId(dataId);
//Setting the other object fields...
entitiesAll.add(dataStore);
if(start%100000==0){
System.out.println("Number of entities: "+start);
saveDataStoreAll(entitiesAll);
}
}
else start++;
}
} catch(Throwable t){
CustomExceptionHandler exception=new CustomExceptionHandler(t);
return exception.persist("DDI", "handleCSV");
}
return true;
}
/**
* The method commits the transaction.
*/
private void saveDataStoreAll(List<DataStoreAll> entities) throws EntityExistsException,IllegalArgumentException,TransactionRequiredException,PersistenceException,Throwable {
Iterator<DataStoreAll> iter=entities.iterator();
ut.begin();
while(iter.hasNext()){
entityManager.persist(iter.next());
iter.remove();
entityManager.flush();
entityManager.clear();
}
ut.commit();
}
@Inject
private EntityManager entityManager;
@Inject
private UserTransaction ut;
}
这是我的 persistence.xml
:
<?xml version="1.0" encoding="UTF-8"?>
<persistence version="2.1"
xmlns="http://xmlns.jcp.org/xml/ns/persistence" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://xmlns.jcp.org/xml/ns/persistence
http://xmlns.jcp.org/xml/ns/persistence/persistence_2_1.xsd">
<persistence-unit name="primary">
<jta-data-source>java:/PostgresDS</jta-data-source>
<properties>
<property name="hibernate.show_sql" value="false" />
<property name="hibernate.jdbc.batch_size" value="50" />
<property name="hibernate.order_inserts" value="true" />
<property name="hibernate.order_updates" value="true" />
<property name="hibernate.jdbc.batch_versioned_data" value="true"/>
<property name="reWriteBatchedInserts" value="true"/>
</properties>
</persistence-unit>
</persistence>
如果我忘了添加什么,请告诉我,我会更新我的 post。
更新
这是调用 DataDaoImpl#send(...)
:
@Named
@ViewScoped
public class DataController implements Serializable {
@PostConstruct
private void init(){
//...
}
/**
* Handle of the uploaded file.
*/
public void handleFileUpload(FileUploadEvent event){
uploadFile=event.getFile();
try(InputStream input = uploadFile.getInputstream()){
Path folder=Paths.get(System.getProperty("jboss.server.data.dir"),"upload");
if(!folder.toFile().exists()){
if(!folder.toFile().mkdirs()){
folder=Paths.get(System.getProperty("jboss.server.data.dir"));
}
}
String filename = FilenameUtils.getBaseName(uploadFile.getFileName());
String extension = FilenameUtils.getExtension(uploadFile.getFileName());
filePath = Files.createTempFile(folder, filename + "-", "." + extension);
//Save the file on the server.
Files.copy(input, filePath, StandardCopyOption.REPLACE_EXISTING);
//Add reference to the unconfirmed uploaded files list.
userFileManager.addUnconfirmedUploadedFile(filePath.toFile());
FacesContext.getCurrentInstance().addMessage(null, new FacesMessage(FacesMessage.SEVERITY_INFO, "Success", uploadFile.getFileName() + " was uploaded."));
} catch (IOException e) {
//...
}
}
/**
* Sending data from file to the database.
*/
public void send(){
//int idFK=...
//The model includes the data from the file and other things which I transfer to the EJB bean.
AddDataModel addDataModel=new AddDataModel();
//Setting the addDataModel fields...
try{
if(uploadFile!=null){
//Each row of the file == 1 entity.
List<String> list=new ArrayList<String>();
Stream<String> stream=Files.lines(filePath);
list=stream.collect(Collectors.toList());
addDataModel.setList(list);
}
} catch (IOException e) {
//...
}
//Sending data to the DataDaoImpl EJB bean.
if(dataDao.send(addDataModel,idFK)){
userFileManager.confirmUploadedFile(filePath.toFile());
FacesContext.getCurrentInstance().addMessage(null, new FacesMessage(FacesMessage.SEVERITY_INFO, "The data was saved in the database.", ""));
}
}
private static final long serialVersionUID = -7202741739427929050L;
@Inject
private DataDao dataDao;
private UserFileManager userFileManager;
private UploadedFile uploadFile;
private Path filePath;
}
更新 2
这是我将数据插入数据库的更新后的 EJB bean:
@Stateless
@TransactionManagement(TransactionManagementType.BEAN)
public class DataDaoImpl extends GenericDaoImpl<Data> {
/**
* This's the first method which is executed.
* The CDI bean (controller) calls this method.
* @param addDataModel - object which includes path to the uploaded file and other things which are needed.
*/
public void send(AddDataModel addDataModel){
if(handleCSV(addDataModel)){
//...
}
else{
//...
}
}
/**
* The method inserts data into the database.
*/
private boolean handleCSV(AddDataModel addDataModel){
PreparedStatement ps=null;
Connection con=null;
FileInputStream fileInputStream=null;
Scanner scanner=null;
try{
con=ds.getConnection();
con.setAutoCommit(false);
ps=con.prepareStatement("insert into data_store_all "
+ "(id,id_user_table,startTime,time,sIP,dIP,sPort,dPort,proto,totBytes,info) "
+ "values(?,?,?,?,?,?,?,?,?,?,?)");
long start=0;
fileInputStream=new FileInputStream(addDataModel.getPath().toFile());
scanner=new Scanner(fileInputStream, "UTF-8");
Pattern patternRow=Pattern.compile(",");
Pattern patternPort=Pattern.compile("\d+");
while(scanner.hasNextLine()) {
if(start!=0){
//Loading a row from the file into table.
String[] data=patternRow.split(scanner.nextLine().replaceAll("[\"]",""));
//Preparing datetime.
SimpleDateFormat simpleDateFormat=new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
GregorianCalendar calendar=new GregorianCalendar();
calendar.setTime(simpleDateFormat.parse(data[1]));
calendar.set(Calendar.MILLISECOND, Integer.parseInt(Pattern.compile("\.").split(data[1])[1])/1000);
//Preparing an entity
ps.setLong(1, start++); //id PK
ps.setInt(2, addDataModel.getIdFk()); //id FK
ps.setTimestamp(3, new Timestamp(calendar.getTime().getTime())); //datetime
ps.setDouble(4, Double.parseDouble(data[2])); //time
ps.setString(5, data[3]); //sip
ps.setString(6, data[4]); //dip
if(!data[5].equals("") && patternPort.matcher(data[5]).matches()) ps.setInt(7, Integer.parseInt(data[5])); //sport
else ps.setNull(7, java.sql.Types.INTEGER);
if(!data[6].equals("") && patternPort.matcher(data[6]).matches()) ps.setInt(8, Integer.parseInt(data[6])); //dport
else ps.setNull(8, java.sql.Types.INTEGER);
ps.setString(9, data[7]); //proto
if(!data[8].trim().equals("")) ps.setLong(10, Long.parseLong(data[8])); //len
else ps.setObject(10, null);
if(data.length==10 && !data[9].trim().equals("")) ps.setString(11, data[9]); //info
else ps.setString(11, null);
ps.addBatch();
if(start%100000==0){
System.out.println("Number of entity: "+start);
ps.executeBatch();
ps.clearParameters();
ps.clearBatch();
con.commit();
}
}
else{
start++;
scanner.nextLine();
}
}
if (scanner.ioException() != null) throw scanner.ioException();
} catch(Throwable t){
CustomExceptionHandler exception=new CustomExceptionHandler(t);
return exception.persist("DDI", "handleCSV");
} finally{
if (fileInputStream!=null)
try {
fileInputStream.close();
} catch (Throwable t2) {
CustomExceptionHandler exception=new CustomExceptionHandler(t2);
return exception.persist("DDI", "handleCSV.Finally");
}
if (scanner != null) scanner.close();
}
return true;
}
@Inject
private EntityManager entityManager;
@Resource(mappedName="java:/PostgresDS")
private DataSource ds;
}
您的问题不一定是数据库或休眠,而是您一次将太多数据加载到内存中。这就是为什么您收到内存不足消息以及为什么您看到 jvm 在途中挣扎的原因。
您从流中读取文件,然后在创建字符串列表时将其全部推送到内存中。然后将该字符串列表映射到某种实体的链表中!
相反,使用流以小块的形式处理文件并将这些块插入到数据库中。基于扫描仪的方法看起来像这样:
FileInputStream inputStream = null;
Scanner sc = null;
try {
inputStream = new FileInputStream(path);
sc = new Scanner(inputStream, "UTF-8");
while (sc.hasNextLine()) {
String line = sc.nextLine();
// Talk to your database here!
}
// note that Scanner suppresses exceptions
if (sc.ioException() != null) {
throw sc.ioException();
}
} finally {
if (inputStream != null) {
inputStream.close();
}
if (sc != null) {
sc.close();
}
}
您可能会发现 hibernate/ejb 进行此更改后效果很好。但我认为您会发现普通的 jdbc 要快得多。他们说你可以期待 3 倍到 4 倍的减速,具体取决于。这将对大量数据产生重大影响。
如果您谈论的是真正大量的数据,那么您应该查看 CopyManager,它可以让您将流直接加载到数据库中。您可以使用流式 api 来转换数据。
当您使用 WildFly 10 时,您处于 Java EE 7 环境中。
因此,您应该考虑使用 JSR-352 批处理来执行文件导入。
看看An Overview of Batch Processing in Java EE 7.0。
这应该可以解决您所有的内存消耗和事务问题。