Java InputStream 对大文件的内存使用有帮助还是有害?
Does a Java InputStream help or hurt memory usage with large files?
我在 Whosebug 上看到一些相互矛盾的帖子,我想得到一个明确的答案。
我开始假设使用 Java InputStream 可以让我从文件中流出字节,从而节省内存,因为我不必一次使用整个文件。这正是我在这里读到的内容:
Loading all bytes to memory is not a good practice. Consider returning the file and opening an input stream to read it, so your application won't crash when handling large files. – andrucz
但是后来我使用 InputStream 读取了一个非常大的 Microsoft Excel 文件(使用 Apache POI 库)并且我 运行 遇到了这个错误:
java.lang.outofmemory exception while reading excel file (xlsx) using POI
我遇到了 OutOfMemory 错误。
这条关键的建议救了我:
One thing that'll make a small difference is when opening the file to start with. If you have a file, then pass that in! Using an InputStream requires buffering of everything into memory, which eats up space. Since you don't need to do that buffering, don't!
我摆脱了 InputStream,只使用了一个裸机 java.io.File,然后 OutOfMemory 错误消失了。
所以在内存使用方面,使用 java.io.File 比 InputSteam 更好?这没有任何意义。
真正的答案是什么?
So you are saying that an InputStream
would typically help?
这完全取决于应用程序(或库)如何>>使用<< InputStream
With what kind of follow up code? Could you offer an example of memory efficient Java?
例如:
// Efficient use of memory
try (InputStream is = new FileInputStream(largeFileName);
BufferedReader br = new BufferedReader(new InputStreamReader(is))) {
String line;
while ((line = br.readLine()) != null) {
// process one line
}
}
// Inefficient use of memory
try (InputStream is = new FileInputStream(largeFileName);
BufferedReader br = new BufferedReader(new InputStreamReader(is))) {
StringBuilder sb = new StringBuilder();
while ((line = br.readLine()) != null) {
sb.append(line).append("\n");
}
String everything = sb.toString();
// process the entire string
}
// Very inefficient use of memory
try (InputStream is = new FileInputStream(largeFileName);
BufferedReader br = new BufferedReader(new InputStreamReader(is))) {
String everything = "";
while ((line = br.readLine()) != null) {
everything += line + "\n";
}
// process the entire string
}
(请注意,还有更高效的方法将文件读入内存,以上示例纯粹是为了说明原理。)
这里的一般原则是:
- 避免将整个文件同时保存在内存中
- 如果你必须将整个文件保存在内存中,那么要小心 "accumulate" 字符。
您链接到上面的帖子:
第一个不是关于内存效率的。相反,它是在谈论 AWS 客户端库的局限性。显然,API 没有提供一种在读取对象时流式传输对象的简单方法。您必须将对象保存到文件中,然后将文件作为流打开。这是否有效内存取决于应用程序对流进行的操作;见上文。
第二个特定于 POI APIs。显然,如果您使用流,POI 库本身会将流内容读入内存。那将是该特定库的实施限制。
我在 Whosebug 上看到一些相互矛盾的帖子,我想得到一个明确的答案。
我开始假设使用 Java InputStream 可以让我从文件中流出字节,从而节省内存,因为我不必一次使用整个文件。这正是我在这里读到的内容:
Loading all bytes to memory is not a good practice. Consider returning the file and opening an input stream to read it, so your application won't crash when handling large files. – andrucz
但是后来我使用 InputStream 读取了一个非常大的 Microsoft Excel 文件(使用 Apache POI 库)并且我 运行 遇到了这个错误:
java.lang.outofmemory exception while reading excel file (xlsx) using POI
我遇到了 OutOfMemory 错误。
这条关键的建议救了我:
One thing that'll make a small difference is when opening the file to start with. If you have a file, then pass that in! Using an InputStream requires buffering of everything into memory, which eats up space. Since you don't need to do that buffering, don't!
我摆脱了 InputStream,只使用了一个裸机 java.io.File,然后 OutOfMemory 错误消失了。
所以在内存使用方面,使用 java.io.File 比 InputSteam 更好?这没有任何意义。
真正的答案是什么?
So you are saying that an
InputStream
would typically help?
这完全取决于应用程序(或库)如何>>使用<< InputStream
With what kind of follow up code? Could you offer an example of memory efficient Java?
例如:
// Efficient use of memory
try (InputStream is = new FileInputStream(largeFileName);
BufferedReader br = new BufferedReader(new InputStreamReader(is))) {
String line;
while ((line = br.readLine()) != null) {
// process one line
}
}
// Inefficient use of memory
try (InputStream is = new FileInputStream(largeFileName);
BufferedReader br = new BufferedReader(new InputStreamReader(is))) {
StringBuilder sb = new StringBuilder();
while ((line = br.readLine()) != null) {
sb.append(line).append("\n");
}
String everything = sb.toString();
// process the entire string
}
// Very inefficient use of memory
try (InputStream is = new FileInputStream(largeFileName);
BufferedReader br = new BufferedReader(new InputStreamReader(is))) {
String everything = "";
while ((line = br.readLine()) != null) {
everything += line + "\n";
}
// process the entire string
}
(请注意,还有更高效的方法将文件读入内存,以上示例纯粹是为了说明原理。)
这里的一般原则是:
- 避免将整个文件同时保存在内存中
- 如果你必须将整个文件保存在内存中,那么要小心 "accumulate" 字符。
您链接到上面的帖子:
第一个不是关于内存效率的。相反,它是在谈论 AWS 客户端库的局限性。显然,API 没有提供一种在读取对象时流式传输对象的简单方法。您必须将对象保存到文件中,然后将文件作为流打开。这是否有效内存取决于应用程序对流进行的操作;见上文。
第二个特定于 POI APIs。显然,如果您使用流,POI 库本身会将流内容读入内存。那将是该特定库的实施限制。