运行 对一个目录中的多个文件执行 java 程序,输出具有唯一名称
run a java program on multiple files in a directory, output with unique names
我的目录结构如下:
base_directory
/ level_one_a
, level_one_b
, level_one_c
/
然后在 level_one_x
中的所有这些目录中有许多后续目录,即
/level_one_a_1
,level_one_a_2
,level_one_a_3
...
等等 level_one_b
& level_one_c
然后在 level_one_a_1
里面我们还有更多,即 level_one_a_1_I
,level_one_a_1_II
,level_one_a_1_III
,level_one_a_1_IV
...
然后最后在level_one_a_1_IV
里面,和所有在同一层的,都是我要操作的文件。
我想更简短的说法是 start
/one
/two
/three
/*files*
有很多文件,我想用我写的一个简单的 java 程序来处理它们:
try
{
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null)
{
sb.append(line);
sb.append(System.lineSeparator());
line = br.readLine();
}
String everything = sb.toString();
Document doc = Jsoup.parse(everything);
String link = doc.select("block.full_text").text();
System.out.println(link);
}
finally
{
br.close();
}
它使用jsoup
我想构建这个脚本,这样程序就可以自主导航这个目录结构并抓取每个文件,然后用那个脚本处理它,使用缓冲 reader 和文件 reader 我猜,我怎样才能促进呢?我尝试实施 this solution 但我无法让它工作。
理想情况下,我想用唯一的名称输出它处理的每个文件,即文件是否命名为 00001.txt
它可能将其保存为 00001_output.txt
但是,那是一匹不同颜色的马
只需使用 java.io.File
及其方法 listFiles
。
请参阅 javadoc File API
此处发布了关于 SO 的类似问题:
Recursively list files in Java
您也可以使用 Java NIO 2 API。
public class ProcessFiles extends SimpleFileVisitor<Path> {
static final String OUT_FORMAT = "%-17s: %s%n";
static final int MAX_DEPTH = 4;
static final Path baseDirectory = Paths.get("R:/base_directory");
public static void main(String[] args) throws IOException {
Set<FileVisitOption> visitOptions = new HashSet<>();
visitOptions.add(FileVisitOption.FOLLOW_LINKS);
Files.walkFileTree(baseDirectory, visitOptions, MAX_DEPTH,
new ProcessFiles()
);
}
@Override
public FileVisitResult visitFile(Path file, BasicFileAttributes attr) {
if (file.getNameCount() <= MAX_DEPTH) {
System.out.printf(OUT_FORMAT, "skip wrong level", file);
return FileVisitResult.SKIP_SUBTREE;
} else {
// add probably a file name check
System.out.printf(OUT_FORMAT, "process file", file);
return CONTINUE;
}
}
@Override
public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attr) {
if (dir.getNameCount() < MAX_DEPTH) {
System.out.printf(OUT_FORMAT, "walk into dir", dir);
return CONTINUE;
}
if (dir.getName(MAX_DEPTH - 1).toString().equals("level_one_a_1_IV")) {
System.out.printf(OUT_FORMAT, "destination dir", dir);
return CONTINUE;
} else {
System.out.printf(OUT_FORMAT, "skip dir name", dir);
return FileVisitResult.SKIP_SUBTREE;
}
}
}
假设以下 directory/file 结构
base_directory
base_directory/base_directory.file
base_directory/level_one_a
base_directory/level_one_a/level_one_a.file
base_directory/level_one_a/level_one_a_1
base_directory/level_one_a/level_one_a_1/level_one_a_1.file
base_directory/level_one_a/level_one_a_1/level_one_a_1_I
base_directory/level_one_a/level_one_a_1/level_one_a_1_I/level_one_a_1_I.file
base_directory/level_one_a/level_one_a_1/level_one_a_1_II
base_directory/level_one_a/level_one_a_1/level_one_a_1_II/level_one_a_1_II.file
base_directory/level_one_a/level_one_a_1/level_one_a_1_III
base_directory/level_one_a/level_one_a_1/level_one_a_1_III/level_one_a_1_III.file
base_directory/level_one_a/level_one_a_1/level_one_a_1_IV
base_directory/level_one_a/level_one_a_1/level_one_a_1_IV/level_one_a_1_IV.file
base_directory/someother_a
base_directory/someother_a/someother_a.file
base_directory/someother_a/someother_a_1
base_directory/someother_a/someother_a_1/someother_a_1.file
base_directory/someother_a/someother_a_1/someother_a_1_I
base_directory/someother_a/someother_a_1/someother_a_1_I/someother_a_1_I.file
base_directory/someother_a/someother_a_1/someother_a_1_II
base_directory/someother_a/someother_a_1/someother_a_1_II/someother_a_1_II.file
base_directory/someother_a/someother_a_1/someother_a_1_III
base_directory/someother_a/someother_a_1/someother_a_1_III/someother_a_1_III.file
base_directory/someother_a/someother_a_1/someother_a_1_IV
base_directory/someother_a/someother_a_1/someother_a_1_IV/someother_a_1_IV.file
您将得到以下输出(用于演示)
walk into dir : R:\base_directory
skip wrong level : R:\base_directory\base_directory.file
walk into dir : R:\base_directory\level_one_a
skip wrong level : R:\base_directory\level_one_a\level_one_a.file
walk into dir : R:\base_directory\level_one_a\level_one_a_1
skip wrong level : R:\base_directory\level_one_a\level_one_a_1\level_one_a_1.file
skip dir name : R:\base_directory\level_one_a\level_one_a_1\level_one_a_1_I
skip dir name : R:\base_directory\level_one_a\level_one_a_1\level_one_a_1_II
skip dir name : R:\base_directory\level_one_a\level_one_a_1\level_one_a_1_III
destination dir : R:\base_directory\level_one_a\level_one_a_1\level_one_a_1_IV
process file : R:\base_directory\level_one_a\level_one_a_1\level_one_a_1_IV\level_one_a_1_IV.file
walk into dir : R:\base_directory\someother_a
skip wrong level : R:\base_directory\someother_a\someother_a.file
walk into dir : R:\base_directory\someother_a\someother_a_1
skip wrong level : R:\base_directory\someother_a\someother_a_1\someother_a_1.file
skip dir name : R:\base_directory\someother_a\someother_a_1\someother_a_1_I
skip dir name : R:\base_directory\someother_a\someother_a_1\someother_a_1_II
skip dir name : R:\base_directory\someother_a\someother_a_1\someother_a_1_III
skip dir name : R:\base_directory\someother_a\someother_a_1\someother_a_1_IV
一些指向 Oralce 教程的链接以供进一步阅读
Walking the File Tree
Finding Files
我的目录结构如下:
base_directory
/ level_one_a
, level_one_b
, level_one_c
/
然后在 level_one_x
中的所有这些目录中有许多后续目录,即
/level_one_a_1
,level_one_a_2
,level_one_a_3
...
等等 level_one_b
& level_one_c
然后在 level_one_a_1
里面我们还有更多,即 level_one_a_1_I
,level_one_a_1_II
,level_one_a_1_III
,level_one_a_1_IV
...
然后最后在level_one_a_1_IV
里面,和所有在同一层的,都是我要操作的文件。
我想更简短的说法是 start
/one
/two
/three
/*files*
有很多文件,我想用我写的一个简单的 java 程序来处理它们:
try
{
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null)
{
sb.append(line);
sb.append(System.lineSeparator());
line = br.readLine();
}
String everything = sb.toString();
Document doc = Jsoup.parse(everything);
String link = doc.select("block.full_text").text();
System.out.println(link);
}
finally
{
br.close();
}
它使用jsoup
我想构建这个脚本,这样程序就可以自主导航这个目录结构并抓取每个文件,然后用那个脚本处理它,使用缓冲 reader 和文件 reader 我猜,我怎样才能促进呢?我尝试实施 this solution 但我无法让它工作。
理想情况下,我想用唯一的名称输出它处理的每个文件,即文件是否命名为 00001.txt
它可能将其保存为 00001_output.txt
但是,那是一匹不同颜色的马
只需使用 java.io.File
及其方法 listFiles
。
请参阅 javadoc File API
此处发布了关于 SO 的类似问题: Recursively list files in Java
您也可以使用 Java NIO 2 API。
public class ProcessFiles extends SimpleFileVisitor<Path> {
static final String OUT_FORMAT = "%-17s: %s%n";
static final int MAX_DEPTH = 4;
static final Path baseDirectory = Paths.get("R:/base_directory");
public static void main(String[] args) throws IOException {
Set<FileVisitOption> visitOptions = new HashSet<>();
visitOptions.add(FileVisitOption.FOLLOW_LINKS);
Files.walkFileTree(baseDirectory, visitOptions, MAX_DEPTH,
new ProcessFiles()
);
}
@Override
public FileVisitResult visitFile(Path file, BasicFileAttributes attr) {
if (file.getNameCount() <= MAX_DEPTH) {
System.out.printf(OUT_FORMAT, "skip wrong level", file);
return FileVisitResult.SKIP_SUBTREE;
} else {
// add probably a file name check
System.out.printf(OUT_FORMAT, "process file", file);
return CONTINUE;
}
}
@Override
public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attr) {
if (dir.getNameCount() < MAX_DEPTH) {
System.out.printf(OUT_FORMAT, "walk into dir", dir);
return CONTINUE;
}
if (dir.getName(MAX_DEPTH - 1).toString().equals("level_one_a_1_IV")) {
System.out.printf(OUT_FORMAT, "destination dir", dir);
return CONTINUE;
} else {
System.out.printf(OUT_FORMAT, "skip dir name", dir);
return FileVisitResult.SKIP_SUBTREE;
}
}
}
假设以下 directory/file 结构
base_directory
base_directory/base_directory.file
base_directory/level_one_a
base_directory/level_one_a/level_one_a.file
base_directory/level_one_a/level_one_a_1
base_directory/level_one_a/level_one_a_1/level_one_a_1.file
base_directory/level_one_a/level_one_a_1/level_one_a_1_I
base_directory/level_one_a/level_one_a_1/level_one_a_1_I/level_one_a_1_I.file
base_directory/level_one_a/level_one_a_1/level_one_a_1_II
base_directory/level_one_a/level_one_a_1/level_one_a_1_II/level_one_a_1_II.file
base_directory/level_one_a/level_one_a_1/level_one_a_1_III
base_directory/level_one_a/level_one_a_1/level_one_a_1_III/level_one_a_1_III.file
base_directory/level_one_a/level_one_a_1/level_one_a_1_IV
base_directory/level_one_a/level_one_a_1/level_one_a_1_IV/level_one_a_1_IV.file
base_directory/someother_a
base_directory/someother_a/someother_a.file
base_directory/someother_a/someother_a_1
base_directory/someother_a/someother_a_1/someother_a_1.file
base_directory/someother_a/someother_a_1/someother_a_1_I
base_directory/someother_a/someother_a_1/someother_a_1_I/someother_a_1_I.file
base_directory/someother_a/someother_a_1/someother_a_1_II
base_directory/someother_a/someother_a_1/someother_a_1_II/someother_a_1_II.file
base_directory/someother_a/someother_a_1/someother_a_1_III
base_directory/someother_a/someother_a_1/someother_a_1_III/someother_a_1_III.file
base_directory/someother_a/someother_a_1/someother_a_1_IV
base_directory/someother_a/someother_a_1/someother_a_1_IV/someother_a_1_IV.file
您将得到以下输出(用于演示)
walk into dir : R:\base_directory
skip wrong level : R:\base_directory\base_directory.file
walk into dir : R:\base_directory\level_one_a
skip wrong level : R:\base_directory\level_one_a\level_one_a.file
walk into dir : R:\base_directory\level_one_a\level_one_a_1
skip wrong level : R:\base_directory\level_one_a\level_one_a_1\level_one_a_1.file
skip dir name : R:\base_directory\level_one_a\level_one_a_1\level_one_a_1_I
skip dir name : R:\base_directory\level_one_a\level_one_a_1\level_one_a_1_II
skip dir name : R:\base_directory\level_one_a\level_one_a_1\level_one_a_1_III
destination dir : R:\base_directory\level_one_a\level_one_a_1\level_one_a_1_IV
process file : R:\base_directory\level_one_a\level_one_a_1\level_one_a_1_IV\level_one_a_1_IV.file
walk into dir : R:\base_directory\someother_a
skip wrong level : R:\base_directory\someother_a\someother_a.file
walk into dir : R:\base_directory\someother_a\someother_a_1
skip wrong level : R:\base_directory\someother_a\someother_a_1\someother_a_1.file
skip dir name : R:\base_directory\someother_a\someother_a_1\someother_a_1_I
skip dir name : R:\base_directory\someother_a\someother_a_1\someother_a_1_II
skip dir name : R:\base_directory\someother_a\someother_a_1\someother_a_1_III
skip dir name : R:\base_directory\someother_a\someother_a_1\someother_a_1_IV
一些指向 Oralce 教程的链接以供进一步阅读
Walking the File Tree
Finding Files