使用 java 读取 20gb 文本文件
Read 20gb text file with java
我有一个 20gb 的文本文件,我想读取它并将数据存储到数据库中。问题是当我尝试在它可以打印出任何内容以查看程序正在执行的操作之前加载它时,它被终止了,这似乎是由于文件的大小。如果有人对如何有效阅读此文件有任何建议,请告诉我。
来自另一个post Read large files in Java
First, if your file contains binary data, then using BufferedReader would be a big mistake (because you would be converting the data to String, which is unnecessary and could easily corrupt the data); you should use a BufferedInputStream instead. If it's text data and you need to split it along linebreaks, then using BufferedReader is OK (assuming the file contains lines of a sensible lenght).
Regarding memory, there shouldn't be any problem if you use a decently sized buffer (I'd use at least 1MB to make sure the HD is doing mostly sequential reading and writing).
If speed turns out to be a problem, you could have a look at the java.nio packages - those are supposedly faster than java.io,
至于将其读入数据库,请确保使用某种批量加载API,否则将花费很长时间。
这是我用于 Netezza 的批量加载例程的示例...
private static final void executeBulkLoad(
Connection connection,
String schema,
String tableName,
File file,
String filename,
String encoding) throws SQLException {
String filePath = file.getAbsolutePath();
String logFolderPath = filePath.replace(filename, "");
String SQLString = "INSERT INTO " + schema + "." + tableName + "\n";
SQLString += "SELECT * FROM\n";
SQLString += "EXTERNAL '" + filePath + "'\n";
SQLString += "USING\n";
SQLString += "(\n";
SQLString += " ENCODING '" + encoding + "'\n";
SQLString += " QUOTEDVALUE 'NO'\n";
SQLString += " FILLRECORD 'TRUE'\n";
SQLString += " NULLVALUE 'NULL'\n";
SQLString += " SKIPROWS 1\n";
SQLString += " DELIMITER '\t'\n";
SQLString += " LOGDIR '" + logFolderPath + "'\n";
SQLString += " REMOTESOURCE 'JDBC'\n";
SQLString += " CTRLCHARS 'TRUE'\n";
SQLString += " IGNOREZERO 'TRUE'\n";
SQLString += " ESCAPECHAR '\'\n";
SQLString += ");";
Statement statement = connection.createStatement();
statement.execute(SQLString);
statement.close();
}
如果您需要将信息加载到数据库中,您可以使用 Spring batch,
有了这个,你将读取你的文件,管理事务,对你的文件执行过程,将你的行保存到数据库中,控制你将要执行提交的记录量,我认为这是一个更好的选择,因为第一个问题是读取大文件,但你的下一个问题将是管理数据库的事务,控制提交等。我希望它能帮助你
如果你正在读取非常大的文件,总是首选 InputStreams。
例如
BufferedReader in = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String line = null;
StringBuilder responseData = new StringBuilder();
while((line = in.readLine()) != null) {
// process line
}
我有一个 20gb 的文本文件,我想读取它并将数据存储到数据库中。问题是当我尝试在它可以打印出任何内容以查看程序正在执行的操作之前加载它时,它被终止了,这似乎是由于文件的大小。如果有人对如何有效阅读此文件有任何建议,请告诉我。
来自另一个post Read large files in Java
First, if your file contains binary data, then using BufferedReader would be a big mistake (because you would be converting the data to String, which is unnecessary and could easily corrupt the data); you should use a BufferedInputStream instead. If it's text data and you need to split it along linebreaks, then using BufferedReader is OK (assuming the file contains lines of a sensible lenght).
Regarding memory, there shouldn't be any problem if you use a decently sized buffer (I'd use at least 1MB to make sure the HD is doing mostly sequential reading and writing).
If speed turns out to be a problem, you could have a look at the java.nio packages - those are supposedly faster than java.io,
至于将其读入数据库,请确保使用某种批量加载API,否则将花费很长时间。
这是我用于 Netezza 的批量加载例程的示例...
private static final void executeBulkLoad(
Connection connection,
String schema,
String tableName,
File file,
String filename,
String encoding) throws SQLException {
String filePath = file.getAbsolutePath();
String logFolderPath = filePath.replace(filename, "");
String SQLString = "INSERT INTO " + schema + "." + tableName + "\n";
SQLString += "SELECT * FROM\n";
SQLString += "EXTERNAL '" + filePath + "'\n";
SQLString += "USING\n";
SQLString += "(\n";
SQLString += " ENCODING '" + encoding + "'\n";
SQLString += " QUOTEDVALUE 'NO'\n";
SQLString += " FILLRECORD 'TRUE'\n";
SQLString += " NULLVALUE 'NULL'\n";
SQLString += " SKIPROWS 1\n";
SQLString += " DELIMITER '\t'\n";
SQLString += " LOGDIR '" + logFolderPath + "'\n";
SQLString += " REMOTESOURCE 'JDBC'\n";
SQLString += " CTRLCHARS 'TRUE'\n";
SQLString += " IGNOREZERO 'TRUE'\n";
SQLString += " ESCAPECHAR '\'\n";
SQLString += ");";
Statement statement = connection.createStatement();
statement.execute(SQLString);
statement.close();
}
如果您需要将信息加载到数据库中,您可以使用 Spring batch, 有了这个,你将读取你的文件,管理事务,对你的文件执行过程,将你的行保存到数据库中,控制你将要执行提交的记录量,我认为这是一个更好的选择,因为第一个问题是读取大文件,但你的下一个问题将是管理数据库的事务,控制提交等。我希望它能帮助你
如果你正在读取非常大的文件,总是首选 InputStreams。 例如
BufferedReader in = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String line = null;
StringBuilder responseData = new StringBuilder();
while((line = in.readLine()) != null) {
// process line
}