Java 从文件中提取多行值

Java extract multiline values from a file

我正在逐行读取文件,有些行具有如下所示的多行值,因此我的循环中断并且 returns 意外结果。

TSNK/Metadata/tk.filename=PZSIIF-anefnsadual-rasdfepdasdort.pdf
TSNK/Metadata/tk_ISIN=LU0291600822,LU0871812862,LU0327774492,LU0291601986,LU0291605201
,LU0291595725,LU0291599800,LU0726995649,LU0726996290,LU0726995995,LU0726995136,LU0726995482,LU0726995219,LU0855227368
TSNK/Metadata/tk_GroupCode=PZSIIF
TSNK/Metadata/tk_GroupCode/PZSIIF=y
TSNK/Metadata/tk_oneTISNumber=16244,17007,16243,11520,19298,18247,20755
TSNK/Metadata/tk_oneTISNumber_TEXT=Neo Emerging Market Corporate Debt 
Neo Emerging Market Debt Opportunities II 
Neo Emerging Market Investment Grade Debt 
Neo Floating Rate II 
Neo Upper Tier Floating Rate 
Global Balanced Regulation 28 
Neo Multi-Sector Credit Income

此处 TSNK/Metadata/tk_ISIN 和 TSNK/Metadata/tk_oneTISNumber_TEXT 具有多行值。从文件中逐行读取时,如何将这些字段作为单行读取?

我尝试了以下逻辑,但没有产生预期的结果:

尝试{

        fr = new FileReader(FILENAME);
        br = new BufferedReader(fr);

        String sCurrentLine;

        br = new BufferedReader(new FileReader(FILENAME));
        int i=1;
        CharSequence  OneTIS = "TSNK/Metadata/tk_oneTISNumber_TEXT";
        StringBuilder builder = new StringBuilder();
        while ((sCurrentLine = br.readLine()) != null) {                
            if(sCurrentLine.contains(OneTIS)==true) {
                System.out.println("Line number here -> "+i);
            builder.append(sCurrentLine);
            builder.append(",");
            }
            else {
                System.out.println("else --->");
            }
            //System.out.println("Line number"+i+" Value is---->>>> "+sCurrentLine);
            i++;
        }
        System.out.println("Line number"+i+" Value is---->>>> "+builder);

解决方案涉及Scanner和多行正则表达式。

此处假设所有行都以 TSNK/Metadata/

开头
Scanner scanner = new Scanner(new File("file.txt"));
scanner.useDelimiter("TSNK/Metadata/");

Pattern p = Pattern.compile("(.*)=(.*)", Pattern.DOTALL | Pattern.MULTILINE);

String s = null;
do {
    if (scanner.hasNext()) {
        s = scanner.next();
        Matcher matcher = p.matcher(s);
        if (matcher.find()) {
            System.out.println("key = '" + matcher.group(1) + "'");
            String[] values = matcher.group(2).split("[,\n]");
            int i = 1;
            for (String value : values) {
                System.out.println(String.format(" val(%d)='%s',", (i++), value ));
            }
        }
    }
} while (s != null);

以上产生输出

key = 'tk.filename'
 val(0)='PZSIIF-anefnsadual-rasdfepdasdort.pdf',
key = 'tk_ISIN'
 val(0)='LU0291600822',
 val(1)='LU0871812862',
 val(2)='LU0327774492',
 val(3)='LU0291601986',
 val(4)='LU0291605201',
 val(5)='',
 val(6)='LU0291595725',
 val(7)='LU0291599800',
 val(8)='LU0726995649',
 val(9)='LU0726996290',
 val(10)='LU0726995995',
 val(11)='LU0726995136',
 val(12)='LU0726995482',
 val(13)='LU0726995219',
 val(14)='LU0855227368',
key = 'tk_GroupCode'
 val(0)='PZSIIF',
key = 'tk_GroupCode/PZSIIF'
 val(0)='y',
key = 'tk_oneTISNumber'
 val(0)='16244',
 val(1)='17007',
 val(2)='16243',
 val(3)='11520',
 val(4)='19298',
 val(5)='18247',
 val(6)='20755',
key = 'tk_oneTISNumber_TEXT'
 val(0)='Neo Emerging Market Corporate Debt ',
 val(1)='Neo Emerging Market Debt Opportunities II ',
 val(2)='Neo Emerging Market Investment Grade Debt ',
 val(3)='Neo Floating Rate II ',
 val(4)='Neo Upper Tier Floating Rate ',
 val(5)='Global Balanced Regulation 28 ',
 val(6)='Neo Multi-Sector Credit Income',

请注意空条目(val(5)tk_ISIN),因为该条目中有新行后跟逗号。通过拒绝空字符串或调整拆分模式,可以很容易地对其进行排序。

希望对您有所帮助!