Java,将一个String按各种标签拆分,存入一个Map

Java, split a String by various tags and store it into a Map

我有一个要求,创建一种降价标签来放置粗体[N]斜体[C] 使用 IText.

创建 PDF's 时给定字符串中的文本

所以,给定这个字符串:

String toCheck = "Example [N]bold text[N] other example [C]italic text[C]";

应该结果:

Example bold text other example italic text


好吧,我们走吧:

我有一个字体类型枚举:

private enum FontType {
    BOLD, ITALIC, NORMAL
}

为此,我想创建一个 LinkedHashMap<String, Enum> 来插入具有相应字体类型的字符串片段(稍后将转换为 com.itextpdf.text.Chunk 并插入到单个 com.itextpdf.text.Paragraph .

那么我怎样才能达到这样的LinkedHashMap结果呢??

pos String            enum
0   "Example "        NORMAL
1   "bold text"       BOLD
2   " other example " NORMAL
3   "italic text"     ITALIC

我创建了一个自定义 Iterator 给我标签位置:

public class OwnIterator implements Iterator<Integer> 
{
    private Iterator<Integer> occurrencesItr;

    public OwnIterator(String toCheck, String[] validPair) {
        // build regex to search for every item in validPair
        Matcher[] matchValidPair = new Matcher[validPair.length];
        for (int i = 0 ; i < validPair.length ; i++) {
            String regex = 
                    "(" +    // start capturing group
                    "\Q" +  // quote entire input string so it is not interpreted as regex
                    validPair[i] +  // this is what we are looking for, duhh 
                    "\E" +  // end quote
                    ")" ;    // end capturing group
            Pattern p = Pattern.compile(regex);
            matchValidPair[i] = p.matcher(toCheck);
        }
        // do the search, saving found occurrences in list
        List<Integer> occurrences = new ArrayList<>();
        for (int i = 0 ; i < matchValidPair.length ; i++) {
            while (matchValidPair[i].find()) {
                occurrences.add(matchValidPair[i].start(0)+1);  // +1 if you want index to start at 1 
            }
        }
        // sort the list 
        Collections.sort(occurrences);
        occurrencesItr = occurrences.iterator();
    }

    @Override
    public boolean hasNext()  {
        return occurrencesItr.hasNext();
    }

    @Override
    public Integer next() {
        return occurrencesItr.next();
    }

    @Override
    public void remove() {
        occurrencesItr.remove();
    }

}

我已经检查了标签是否平衡,我可以得到所有标签位置:

String[] validPair = {"[N]", "[C]" };
OwnIterator itr = new OwnIterator(toCheck, validPair);
while (itr.hasNext()) {
    System.out.println(itr.next());
}

但是在获取所有位置后无法弄清楚如何区分每个部分并分配正确的枚举值。

一些想法? 也许我的方法有误,或者有人可以找到更好的方法?

这个怎么样?

...
String toCheck = "Example [N]bold text[N] other example [C]italic text[C]";
toCheck = replacePairs(toCheck , "[N]","<b>", "</b>");
toCheck = replacePairs(toCheck , "[C]","<i>", "</i>");

OutputStream file = new FileOutputStream(new File("Test.pdf"));
Document document = new Document();
PdfWriter.getInstance(document, file);
document.open();
HTMLWorker htmlWorker = new HTMLWorker(document);
htmlWorker.parse(new StringReader(toCheck));
document.close();
file.close();
...

private String replacePairs(String input, String tag, String openTag, String closeTag) {
    String output = input;
    while(output.indexOf(tag) >= 0) {
        output = output.replaceFirst(tag, openTag);
        if (output.indexOf(tag) < 0) {
            throw new IllegalArgumentException("Missing closing tag:" + tag);
        }
        output = output.replaceFirst(tag, closeTag);
    }
    return output;
}

免责声明:这是未编译的代码,因此未经测试。您将希望在 finally 块中处理异常并正确关闭资源(或使用 try-with-resources)。

下面的一段代码会给你想要的 LinkedHashMap,

private Map<String, FontType> getMapFromTags(String toCheck) {
    Map<String, FontType> chunksMap = new LinkedHashMap<>();
    boolean openTag = false;

    while (toCheck.contains(TAG_NEGRITA) || toCheck.contains(TAG_CURSIVA)) {
        final int indexOfBold = toCheck.indexOf(TAG_NEGRITA);
        final int indexOfItalics = toCheck.indexOf(TAG_CURSIVA);

        final int indexToUse = getValidIndexToUse(indexOfBold, indexOfItalics);

        final String substring = toCheck.substring(0, indexToUse);
        toCheck = toCheck.substring(indexToUse + 3, toCheck.length());

        if (!substring.isEmpty()) {
            if (!openTag) {
                chunksMap.put(substring, FontType.NORMAL);
            } else if (indexToUse == indexOfBold) {
                chunksMap.put(substring, FontType.BOLD);
            } else {
                chunksMap.put(substring, FontType.ITALIC);
            }
        }

        openTag = !openTag;
    }
    // check if there is some NORMAL text at the end
    if (!toCheck.isEmpty())
        chunksMap.put(toCheck, FontType.NORMAL);

    return chunksMap;
}

private int getValidIndexToUse(int indexOfBold, int indexOfItalics) {
    if (indexOfBold > -1 && indexOfItalics == -1)
        return indexOfBold;
    else if (indexOfItalics > -1 && indexOfBold == -1)
        return indexOfItalics;
    else 
        return indexOfBold > -1 && indexOfBold < indexOfItalics ? indexOfBold : indexOfItalics;
}

但是当您发现两个或更多必须被散列的相等字符串时,就会出现问题。