Java 从文件中提取多行值
Java extract multiline values from a file
我正在逐行读取文件,有些行具有如下所示的多行值,因此我的循环中断并且 returns 意外结果。
TSNK/Metadata/tk.filename=PZSIIF-anefnsadual-rasdfepdasdort.pdf
TSNK/Metadata/tk_ISIN=LU0291600822,LU0871812862,LU0327774492,LU0291601986,LU0291605201
,LU0291595725,LU0291599800,LU0726995649,LU0726996290,LU0726995995,LU0726995136,LU0726995482,LU0726995219,LU0855227368
TSNK/Metadata/tk_GroupCode=PZSIIF
TSNK/Metadata/tk_GroupCode/PZSIIF=y
TSNK/Metadata/tk_oneTISNumber=16244,17007,16243,11520,19298,18247,20755
TSNK/Metadata/tk_oneTISNumber_TEXT=Neo Emerging Market Corporate Debt
Neo Emerging Market Debt Opportunities II
Neo Emerging Market Investment Grade Debt
Neo Floating Rate II
Neo Upper Tier Floating Rate
Global Balanced Regulation 28
Neo Multi-Sector Credit Income
此处 TSNK/Metadata/tk_ISIN 和 TSNK/Metadata/tk_oneTISNumber_TEXT 具有多行值。从文件中逐行读取时,如何将这些字段作为单行读取?
我尝试了以下逻辑,但没有产生预期的结果:
尝试{
fr = new FileReader(FILENAME);
br = new BufferedReader(fr);
String sCurrentLine;
br = new BufferedReader(new FileReader(FILENAME));
int i=1;
CharSequence OneTIS = "TSNK/Metadata/tk_oneTISNumber_TEXT";
StringBuilder builder = new StringBuilder();
while ((sCurrentLine = br.readLine()) != null) {
if(sCurrentLine.contains(OneTIS)==true) {
System.out.println("Line number here -> "+i);
builder.append(sCurrentLine);
builder.append(",");
}
else {
System.out.println("else --->");
}
//System.out.println("Line number"+i+" Value is---->>>> "+sCurrentLine);
i++;
}
System.out.println("Line number"+i+" Value is---->>>> "+builder);
解决方案涉及Scanner
和多行正则表达式。
此处假设所有行都以 TSNK/Metadata/
开头
Scanner scanner = new Scanner(new File("file.txt"));
scanner.useDelimiter("TSNK/Metadata/");
Pattern p = Pattern.compile("(.*)=(.*)", Pattern.DOTALL | Pattern.MULTILINE);
String s = null;
do {
if (scanner.hasNext()) {
s = scanner.next();
Matcher matcher = p.matcher(s);
if (matcher.find()) {
System.out.println("key = '" + matcher.group(1) + "'");
String[] values = matcher.group(2).split("[,\n]");
int i = 1;
for (String value : values) {
System.out.println(String.format(" val(%d)='%s',", (i++), value ));
}
}
}
} while (s != null);
以上产生输出
key = 'tk.filename'
val(0)='PZSIIF-anefnsadual-rasdfepdasdort.pdf',
key = 'tk_ISIN'
val(0)='LU0291600822',
val(1)='LU0871812862',
val(2)='LU0327774492',
val(3)='LU0291601986',
val(4)='LU0291605201',
val(5)='',
val(6)='LU0291595725',
val(7)='LU0291599800',
val(8)='LU0726995649',
val(9)='LU0726996290',
val(10)='LU0726995995',
val(11)='LU0726995136',
val(12)='LU0726995482',
val(13)='LU0726995219',
val(14)='LU0855227368',
key = 'tk_GroupCode'
val(0)='PZSIIF',
key = 'tk_GroupCode/PZSIIF'
val(0)='y',
key = 'tk_oneTISNumber'
val(0)='16244',
val(1)='17007',
val(2)='16243',
val(3)='11520',
val(4)='19298',
val(5)='18247',
val(6)='20755',
key = 'tk_oneTISNumber_TEXT'
val(0)='Neo Emerging Market Corporate Debt ',
val(1)='Neo Emerging Market Debt Opportunities II ',
val(2)='Neo Emerging Market Investment Grade Debt ',
val(3)='Neo Floating Rate II ',
val(4)='Neo Upper Tier Floating Rate ',
val(5)='Global Balanced Regulation 28 ',
val(6)='Neo Multi-Sector Credit Income',
请注意空条目(val(5)
键 tk_ISIN
),因为该条目中有新行后跟逗号。通过拒绝空字符串或调整拆分模式,可以很容易地对其进行排序。
希望对您有所帮助!
我正在逐行读取文件,有些行具有如下所示的多行值,因此我的循环中断并且 returns 意外结果。
TSNK/Metadata/tk.filename=PZSIIF-anefnsadual-rasdfepdasdort.pdf
TSNK/Metadata/tk_ISIN=LU0291600822,LU0871812862,LU0327774492,LU0291601986,LU0291605201
,LU0291595725,LU0291599800,LU0726995649,LU0726996290,LU0726995995,LU0726995136,LU0726995482,LU0726995219,LU0855227368
TSNK/Metadata/tk_GroupCode=PZSIIF
TSNK/Metadata/tk_GroupCode/PZSIIF=y
TSNK/Metadata/tk_oneTISNumber=16244,17007,16243,11520,19298,18247,20755
TSNK/Metadata/tk_oneTISNumber_TEXT=Neo Emerging Market Corporate Debt
Neo Emerging Market Debt Opportunities II
Neo Emerging Market Investment Grade Debt
Neo Floating Rate II
Neo Upper Tier Floating Rate
Global Balanced Regulation 28
Neo Multi-Sector Credit Income
此处 TSNK/Metadata/tk_ISIN 和 TSNK/Metadata/tk_oneTISNumber_TEXT 具有多行值。从文件中逐行读取时,如何将这些字段作为单行读取?
我尝试了以下逻辑,但没有产生预期的结果:
尝试{
fr = new FileReader(FILENAME);
br = new BufferedReader(fr);
String sCurrentLine;
br = new BufferedReader(new FileReader(FILENAME));
int i=1;
CharSequence OneTIS = "TSNK/Metadata/tk_oneTISNumber_TEXT";
StringBuilder builder = new StringBuilder();
while ((sCurrentLine = br.readLine()) != null) {
if(sCurrentLine.contains(OneTIS)==true) {
System.out.println("Line number here -> "+i);
builder.append(sCurrentLine);
builder.append(",");
}
else {
System.out.println("else --->");
}
//System.out.println("Line number"+i+" Value is---->>>> "+sCurrentLine);
i++;
}
System.out.println("Line number"+i+" Value is---->>>> "+builder);
解决方案涉及Scanner
和多行正则表达式。
此处假设所有行都以 TSNK/Metadata/
Scanner scanner = new Scanner(new File("file.txt"));
scanner.useDelimiter("TSNK/Metadata/");
Pattern p = Pattern.compile("(.*)=(.*)", Pattern.DOTALL | Pattern.MULTILINE);
String s = null;
do {
if (scanner.hasNext()) {
s = scanner.next();
Matcher matcher = p.matcher(s);
if (matcher.find()) {
System.out.println("key = '" + matcher.group(1) + "'");
String[] values = matcher.group(2).split("[,\n]");
int i = 1;
for (String value : values) {
System.out.println(String.format(" val(%d)='%s',", (i++), value ));
}
}
}
} while (s != null);
以上产生输出
key = 'tk.filename'
val(0)='PZSIIF-anefnsadual-rasdfepdasdort.pdf',
key = 'tk_ISIN'
val(0)='LU0291600822',
val(1)='LU0871812862',
val(2)='LU0327774492',
val(3)='LU0291601986',
val(4)='LU0291605201',
val(5)='',
val(6)='LU0291595725',
val(7)='LU0291599800',
val(8)='LU0726995649',
val(9)='LU0726996290',
val(10)='LU0726995995',
val(11)='LU0726995136',
val(12)='LU0726995482',
val(13)='LU0726995219',
val(14)='LU0855227368',
key = 'tk_GroupCode'
val(0)='PZSIIF',
key = 'tk_GroupCode/PZSIIF'
val(0)='y',
key = 'tk_oneTISNumber'
val(0)='16244',
val(1)='17007',
val(2)='16243',
val(3)='11520',
val(4)='19298',
val(5)='18247',
val(6)='20755',
key = 'tk_oneTISNumber_TEXT'
val(0)='Neo Emerging Market Corporate Debt ',
val(1)='Neo Emerging Market Debt Opportunities II ',
val(2)='Neo Emerging Market Investment Grade Debt ',
val(3)='Neo Floating Rate II ',
val(4)='Neo Upper Tier Floating Rate ',
val(5)='Global Balanced Regulation 28 ',
val(6)='Neo Multi-Sector Credit Income',
请注意空条目(val(5)
键 tk_ISIN
),因为该条目中有新行后跟逗号。通过拒绝空字符串或调整拆分模式,可以很容易地对其进行排序。
希望对您有所帮助!