Java: 如何使 bufferedreader 循环遍历输入文件,运行 代码块一次
Java: how to make bufferedreader loop through input file, and run code blocks once
...抱歉篇幅过长... ...以及愚蠢的 code/text 条目。我还是菜鸟
我需要通过 Trimble 差分校正日志文件(示例如下)解析获取指定值,并将它们放入 .csv 文件中,以便在上传到 Oracle 之前进行质量检查。我选择 Java 进行专业开发,因为它是用于开发其他内部软件的语言。
我将指向单个文件,直到我可以得到正确的输出,然后我将继续遍历文件结构。
我的要求:读取日志(.txt UTF-16LE)并从几个大部分相似的文本块中获取特定值;然后在随后的大部分相似(但不同于第一块)文本块中找到其他值。将这些值放入 .csv 以导入到电子表格中,以便每个日志文件进行质量检查。
文本块中的值可能会有所不同,但所有可能的变化都是已知的。
我只关心第一段文字,ATM。我感兴趣的所有值的正则表达式如下。
import java.io.*;
import java.nio.*;
import java.util.*;
import java.util.regex.*;
public class LogParser
{
public static void main (String[] args)throws IOException
{
//log file Reader init:
File corrFile = new File("D:\Utilities\Development\Java\HPGPSLogParser\Correct_2015-10-13_10-51.txt");
BufferedReader corrReader = new BufferedReader(new InputStreamReader(new FileInputStream(corrFile),"UTF-16LE"));
String corrText = "";
String corrLine = "";
/*
NOTE: PFO diffcorr log files are encoded in UTF-16 LE
*/
//Writer init:
File stateCSV = new File("D:\Utilities\Development\Java\HPGPSLogParser\MH.csv");
BufferedWriter corrWriter = new BufferedWriter(new FileWriter(stateCSV, true));
String fileText = "";
//output reader variables:
String corrOutput = "";
String outputLine = "";
//Management variables: ID location & specify actions:
String roverFile = "Rover file: ";
String procRoverFile = "Processing rover file, ";
String carrProcess = "";
String codeProces = "";
//regex variables:
Pattern fileName1 = Pattern.compile("Rover file: (?<fileName1>[A-Z]{2}-\d{3}-\d{5}-SP\d\.SSF)+");
Pattern noBase = Pattern.compile("(?<noBase>No matching base data found)");
Pattern totalCoverage = Pattern.compile("(?<totalCoverage>[\d]{1,3})\% total coverage");
Pattern coverageBy = Pattern.compile("(?<coverageBy>[\d]{1,3})+\% coverage by (?<baseStation>\b\w+\b\.[zZ].*)+", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
Pattern carrierProcessing = Pattern.compile("Carrier processing\.\.\.");
Pattern fileName2 = Pattern.compile("Processing rover file, (?<fileName2>[A-Z]{2}-\d{3}-\d{5}-SP\d\.SSF)+ \.\.\.");
Pattern noProc = Pattern.compile("(?<noProc>No processing performed as base data does not have carrier data)");
Pattern noCarDat = Pattern.compile("(?<noCarDat>No carrier processing performed as file has no carrier data)");
Pattern selectedPositions = Pattern.compile("Selected (?<selectedPositions>\d{1,6}) positions for post-processing");
Pattern correctedPositions = Pattern.compile("Corrected (?<correctedPositions>\d{1,6}) positions");
Pattern correctFailed = Pattern.compile("Failed to correct (?<correctFailed>\d{1,6}) positions");
Pattern carrierMissingBase = Pattern.compile("(?<carrierMissingBase>\d{1,6}) of these were due to missing base data");
Pattern carrierInsuffSat = Pattern.compile("(?<carrierInsuffSat>\d{1,6}) of these were due to insufficient satellites for position fix");
Pattern codeProcessing = Pattern.compile("Code processing\.\.\.");
Pattern refGap = Pattern.compile("(?<refGap>Reference station data gap encountered: )");
Pattern codeChose = Pattern.compile("Chose (?<codeChose>\d{1,6}) code solutions over the carrier solutions");
Pattern codeHighQual = Pattern.compile("(?<codeHighQual>\d{1,6}) code solutions were of higher quality");
Pattern filtered = Pattern.compile("Filtered out (?<filtered>\d{1,6}) uncorrected positions");
try(corrReader)
{
while ((corrLine = corrReader.readLine())!=null)
{
corrText = corrLine.trim();
Matcher carrProcMatcher = carrierProcessing.matcher(corrText);
if (corrText.contains(roverFile))
{
Matcher file1Matcher = fileName1.matcher(corrText); //first order variable based on 'Rover file: fileName1'
if(file1Matcher.find())
{
String firstFileName = file1Matcher.group("fileName1");
if (corrOutput.equals(""))
{
corrOutput += firstFileName+",";
} else {
corrOutput += "\n"+firstFileName+",";
} //end else
Matcher baseMatcher = noBase.matcher(corrText);
if(baseMatcher.find())
{
String noBaseText = baseMatcher.group("noBase");
if(noBaseText.equals("No matching base data found"))
{
corrOutput += "TRUE"+",";
} else {
corrOutput += ",";
} //end else
}
Matcher totCovMatcher = totalCoverage.matcher(corrText);
if(totCovMatcher.find())
{
String totalCovText = totCovMatcher.group("totalCoverage");
corrOutput += totalCovText+",";
}
Matcher covByMatcher = coverageBy.matcher(corrText);
if(covByMatcher.find())
{
String covByPct = covByMatcher.group("coverageBy");
String covByProvider = covByMatcher.group("baseStation");
corrOutput += covByPct+","+covByProvider+",";
}
corrWriter.write(corrOutput);
corrWriter.flush();
} // end file1Matcher if
} //end corrText.contains if
} //end while loop
// corrWriter.write(corrOutput);
corrWriter.close();
corrReader.close();
} //end try corrReader
} //end main method
} //end class
我感兴趣的日志文件的日志内容如下:
--------Coverage Details:--------------------
Rover file: AA-123-12345-SP1.SSF
Local time: 2/11/2014 8:06:30 PM to 2/11/2014 8:37:15 PM
100% total coverage
100% coverage by guug04314054.zip
Rover file: AA-321-54321-SP1.SSF
Local time: 2/3/2015 4:06:14 PM to 2/3/2015 4:06:44 PM
0% total coverage. No matching base data found.
Rover file: AA-132-12354-SP2.SSF
Local time: 2/17/2014 5:51:01 PM to 2/17/2014 6:18:57 PM
100% total coverage
4% coverage by guug04914003.zip
100% coverage by guug04914022.zip
我需要我的输出看起来像:
AA-123-12345-SP1.SSF,100,100,guug04914003.zip,
或
AA-312-12435-SP1.SSF,TRUE,0,
我的代码多次循环输入文件,生成重复条目。我如何获得每个 'Rover file:' 文本块的单个输出条目?
谢谢!!
只循环一次文件。边走边阅读和收集数据。在拥有所有数据之前不要构建输出。当您看到新的 Rover file
条目时,写入输出(除非第一个条目)并清除值。当你到达终点时,写入输出(如果有的话)。
在 class 中隔离代码可能会更容易重用打印逻辑。
示例:
public final class LogEntry {
private final Pattern pattern = Pattern.compile("Rover file: (.*)" +
"|(\d+)% total coverage" +
"|(\d+)% coverage by (.*)");
private String roverFile;
private Integer totalCoverage;
private Map<String, Integer> fileCoverage = new LinkedHashMap<>();
public void process(BufferedReader in) throws IOException {
for (String line; (line = in.readLine()) != null; ) {
Matcher m = this.pattern.matcher(line);
if (! m.matches())
continue;
if (m.start(1) != -1) {
print();
clear();
this.roverFile = m.group(1);
} else if (m.start(2) != -1) {
this.totalCoverage = Integer.valueOf(m.group(2));
} else if (m.start(3) != -1) {
this.fileCoverage.put(m.group(4), Integer.valueOf(m.group(3)));
}
}
print();
}
private void clear() {
this.roverFile = null;
this.totalCoverage = null;
this.fileCoverage.clear();
}
private void print() {
if (this.roverFile == null)
return;
if (this.fileCoverage.isEmpty()) {
System.out.println(this.roverFile + "," + this.totalCoverage);
} else {
for (Entry<String, Integer> entry : this.fileCoverage.entrySet()) {
System.out.println(this.roverFile + "," + this.totalCoverage + "," + entry.getValue() + "," + entry.getKey());
}
}
}
}
测试
String input = "Rover file: AA-123-12345-SP1.SSF\n" +
"Local time: 2/11/2014 8:06:30 PM to 2/11/2014 8:37:15 PM\n" +
"100% total coverage\n" +
"100% coverage by guug04314054.zip\n" +
"Rover file: AA-321-54321-SP1.SSF\n" +
"Local time: 2/3/2015 4:06:14 PM to 2/3/2015 4:06:44 PM\n" +
"0% total coverage. No matching base data found.\n" +
"Rover file: AA-132-12354-SP2.SSF\n" +
"Local time: 2/17/2014 5:51:01 PM to 2/17/2014 6:18:57 PM\n" +
"100% total coverage\n" +
"4% coverage by guug04914003.zip\n" +
"100% coverage by guug04914022.zip\n";
try (BufferedReader in = new BufferedReader(new StringReader(input))) {
new LogEntry().process(in);
}
输出
AA-123-12345-SP1.SSF,100,100,guug04314054.zip
AA-321-54321-SP1.SSF,0
AA-132-12354-SP2.SSF,100,4,guug04914003.zip
AA-132-12354-SP2.SSF,100,100,guug04914022.zip
...抱歉篇幅过长... ...以及愚蠢的 code/text 条目。我还是菜鸟
我需要通过 Trimble 差分校正日志文件(示例如下)解析获取指定值,并将它们放入 .csv 文件中,以便在上传到 Oracle 之前进行质量检查。我选择 Java 进行专业开发,因为它是用于开发其他内部软件的语言。
我将指向单个文件,直到我可以得到正确的输出,然后我将继续遍历文件结构。
我的要求:读取日志(.txt UTF-16LE)并从几个大部分相似的文本块中获取特定值;然后在随后的大部分相似(但不同于第一块)文本块中找到其他值。将这些值放入 .csv 以导入到电子表格中,以便每个日志文件进行质量检查。
文本块中的值可能会有所不同,但所有可能的变化都是已知的。
我只关心第一段文字,ATM。我感兴趣的所有值的正则表达式如下。
import java.io.*;
import java.nio.*;
import java.util.*;
import java.util.regex.*;
public class LogParser
{
public static void main (String[] args)throws IOException
{
//log file Reader init:
File corrFile = new File("D:\Utilities\Development\Java\HPGPSLogParser\Correct_2015-10-13_10-51.txt");
BufferedReader corrReader = new BufferedReader(new InputStreamReader(new FileInputStream(corrFile),"UTF-16LE"));
String corrText = "";
String corrLine = "";
/*
NOTE: PFO diffcorr log files are encoded in UTF-16 LE
*/
//Writer init:
File stateCSV = new File("D:\Utilities\Development\Java\HPGPSLogParser\MH.csv");
BufferedWriter corrWriter = new BufferedWriter(new FileWriter(stateCSV, true));
String fileText = "";
//output reader variables:
String corrOutput = "";
String outputLine = "";
//Management variables: ID location & specify actions:
String roverFile = "Rover file: ";
String procRoverFile = "Processing rover file, ";
String carrProcess = "";
String codeProces = "";
//regex variables:
Pattern fileName1 = Pattern.compile("Rover file: (?<fileName1>[A-Z]{2}-\d{3}-\d{5}-SP\d\.SSF)+");
Pattern noBase = Pattern.compile("(?<noBase>No matching base data found)");
Pattern totalCoverage = Pattern.compile("(?<totalCoverage>[\d]{1,3})\% total coverage");
Pattern coverageBy = Pattern.compile("(?<coverageBy>[\d]{1,3})+\% coverage by (?<baseStation>\b\w+\b\.[zZ].*)+", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
Pattern carrierProcessing = Pattern.compile("Carrier processing\.\.\.");
Pattern fileName2 = Pattern.compile("Processing rover file, (?<fileName2>[A-Z]{2}-\d{3}-\d{5}-SP\d\.SSF)+ \.\.\.");
Pattern noProc = Pattern.compile("(?<noProc>No processing performed as base data does not have carrier data)");
Pattern noCarDat = Pattern.compile("(?<noCarDat>No carrier processing performed as file has no carrier data)");
Pattern selectedPositions = Pattern.compile("Selected (?<selectedPositions>\d{1,6}) positions for post-processing");
Pattern correctedPositions = Pattern.compile("Corrected (?<correctedPositions>\d{1,6}) positions");
Pattern correctFailed = Pattern.compile("Failed to correct (?<correctFailed>\d{1,6}) positions");
Pattern carrierMissingBase = Pattern.compile("(?<carrierMissingBase>\d{1,6}) of these were due to missing base data");
Pattern carrierInsuffSat = Pattern.compile("(?<carrierInsuffSat>\d{1,6}) of these were due to insufficient satellites for position fix");
Pattern codeProcessing = Pattern.compile("Code processing\.\.\.");
Pattern refGap = Pattern.compile("(?<refGap>Reference station data gap encountered: )");
Pattern codeChose = Pattern.compile("Chose (?<codeChose>\d{1,6}) code solutions over the carrier solutions");
Pattern codeHighQual = Pattern.compile("(?<codeHighQual>\d{1,6}) code solutions were of higher quality");
Pattern filtered = Pattern.compile("Filtered out (?<filtered>\d{1,6}) uncorrected positions");
try(corrReader)
{
while ((corrLine = corrReader.readLine())!=null)
{
corrText = corrLine.trim();
Matcher carrProcMatcher = carrierProcessing.matcher(corrText);
if (corrText.contains(roverFile))
{
Matcher file1Matcher = fileName1.matcher(corrText); //first order variable based on 'Rover file: fileName1'
if(file1Matcher.find())
{
String firstFileName = file1Matcher.group("fileName1");
if (corrOutput.equals(""))
{
corrOutput += firstFileName+",";
} else {
corrOutput += "\n"+firstFileName+",";
} //end else
Matcher baseMatcher = noBase.matcher(corrText);
if(baseMatcher.find())
{
String noBaseText = baseMatcher.group("noBase");
if(noBaseText.equals("No matching base data found"))
{
corrOutput += "TRUE"+",";
} else {
corrOutput += ",";
} //end else
}
Matcher totCovMatcher = totalCoverage.matcher(corrText);
if(totCovMatcher.find())
{
String totalCovText = totCovMatcher.group("totalCoverage");
corrOutput += totalCovText+",";
}
Matcher covByMatcher = coverageBy.matcher(corrText);
if(covByMatcher.find())
{
String covByPct = covByMatcher.group("coverageBy");
String covByProvider = covByMatcher.group("baseStation");
corrOutput += covByPct+","+covByProvider+",";
}
corrWriter.write(corrOutput);
corrWriter.flush();
} // end file1Matcher if
} //end corrText.contains if
} //end while loop
// corrWriter.write(corrOutput);
corrWriter.close();
corrReader.close();
} //end try corrReader
} //end main method
} //end class
我感兴趣的日志文件的日志内容如下:
--------Coverage Details:--------------------
Rover file: AA-123-12345-SP1.SSF
Local time: 2/11/2014 8:06:30 PM to 2/11/2014 8:37:15 PM
100% total coverage
100% coverage by guug04314054.zip
Rover file: AA-321-54321-SP1.SSF
Local time: 2/3/2015 4:06:14 PM to 2/3/2015 4:06:44 PM
0% total coverage. No matching base data found.
Rover file: AA-132-12354-SP2.SSF
Local time: 2/17/2014 5:51:01 PM to 2/17/2014 6:18:57 PM
100% total coverage
4% coverage by guug04914003.zip
100% coverage by guug04914022.zip
我需要我的输出看起来像:
AA-123-12345-SP1.SSF,100,100,guug04914003.zip,
或
AA-312-12435-SP1.SSF,TRUE,0,
我的代码多次循环输入文件,生成重复条目。我如何获得每个 'Rover file:' 文本块的单个输出条目?
谢谢!!
只循环一次文件。边走边阅读和收集数据。在拥有所有数据之前不要构建输出。当您看到新的 Rover file
条目时,写入输出(除非第一个条目)并清除值。当你到达终点时,写入输出(如果有的话)。
在 class 中隔离代码可能会更容易重用打印逻辑。
示例:
public final class LogEntry {
private final Pattern pattern = Pattern.compile("Rover file: (.*)" +
"|(\d+)% total coverage" +
"|(\d+)% coverage by (.*)");
private String roverFile;
private Integer totalCoverage;
private Map<String, Integer> fileCoverage = new LinkedHashMap<>();
public void process(BufferedReader in) throws IOException {
for (String line; (line = in.readLine()) != null; ) {
Matcher m = this.pattern.matcher(line);
if (! m.matches())
continue;
if (m.start(1) != -1) {
print();
clear();
this.roverFile = m.group(1);
} else if (m.start(2) != -1) {
this.totalCoverage = Integer.valueOf(m.group(2));
} else if (m.start(3) != -1) {
this.fileCoverage.put(m.group(4), Integer.valueOf(m.group(3)));
}
}
print();
}
private void clear() {
this.roverFile = null;
this.totalCoverage = null;
this.fileCoverage.clear();
}
private void print() {
if (this.roverFile == null)
return;
if (this.fileCoverage.isEmpty()) {
System.out.println(this.roverFile + "," + this.totalCoverage);
} else {
for (Entry<String, Integer> entry : this.fileCoverage.entrySet()) {
System.out.println(this.roverFile + "," + this.totalCoverage + "," + entry.getValue() + "," + entry.getKey());
}
}
}
}
测试
String input = "Rover file: AA-123-12345-SP1.SSF\n" +
"Local time: 2/11/2014 8:06:30 PM to 2/11/2014 8:37:15 PM\n" +
"100% total coverage\n" +
"100% coverage by guug04314054.zip\n" +
"Rover file: AA-321-54321-SP1.SSF\n" +
"Local time: 2/3/2015 4:06:14 PM to 2/3/2015 4:06:44 PM\n" +
"0% total coverage. No matching base data found.\n" +
"Rover file: AA-132-12354-SP2.SSF\n" +
"Local time: 2/17/2014 5:51:01 PM to 2/17/2014 6:18:57 PM\n" +
"100% total coverage\n" +
"4% coverage by guug04914003.zip\n" +
"100% coverage by guug04914022.zip\n";
try (BufferedReader in = new BufferedReader(new StringReader(input))) {
new LogEntry().process(in);
}
输出
AA-123-12345-SP1.SSF,100,100,guug04314054.zip
AA-321-54321-SP1.SSF,0
AA-132-12354-SP2.SSF,100,4,guug04914003.zip
AA-132-12354-SP2.SSF,100,100,guug04914022.zip