在 java 中生成两个 hashmap 时发生冲突

Conflicting happens when generating two hashmaps in java

我有两个文本,我进行标记化和删除停用词,然后进行词形还原,然后生成一个哈希图,其中包含词条及其在文本中的频率。

当我在一个文本上应用上述步骤时,一切正常,如下所示:

String train = "hamza was studying hamza studied yesterday";
    String test = "hamza is swimming today";

    sportBag = bagOfWords.Bag(lemmatize.Lemmatize(tokenizeAndRemoveStopWords.RemoveStopWords(train)));

    //testBag = bagOfWords.Bag(lemmatize.Lemmatize(tokenizeAndRemoveStopWords.RemoveStopWords(test)));
    System.out.println("hashmap for train");

    for (String name : sportBag.keySet()) {
        String key = name;
        int value = sportBag.get(name);
        System.out.println(key + " " + value);
    }

    System.out.println("hashmap for test");
    for (String name : testBag.keySet()) {
        String key = name;
        int value = testBag.get(name);
        System.out.println(key + " " + value);
    }

并且输出符合预期

hashmap for train
yesterday 1
study 2
hamza 2

当我生成如下两个哈希图时出现问题:

String train = "hamza was studying hamza studied yesterday";
    String test = "hamza is swimming today";

    sportBag = bagOfWords.Bag(lemmatize.Lemmatize(tokenizeAndRemoveStopWords.RemoveStopWords(train)));

    testBag = bagOfWords.Bag(lemmatize.Lemmatize(tokenizeAndRemoveStopWords.RemoveStopWords(test)));
    System.out.println("hashmap for train");

    for (String name : sportBag.keySet()) {
        String key = name;
        int value = sportBag.get(name);
        System.out.println(key + " " + value);
    }

    System.out.println("hashmap for test");
    for (String name : testBag.keySet()) {
        String key = name;
        int value = testBag.get(name);
        System.out.println(key + " " + value);
    }

这里出现了问题

hashmap for train
yesterday 1
swimming 1
study 2
today 1
hamza 2
hashmap for test
yesterday 1
swimming 1
study 2
today 1
hamza 2

这里是 Bag 方法:

public Map<String, Integer> words = new HashMap<>();

/**
 * Constructor.
 *
 * @param wordsList
 * @return
 */
public Map<String, Integer> Bag(List<String> wordsList) {
    for (int i = 0; i < wordsList.size(); i++) {
        int freq = 0;
        for (int j = 0; j < wordsList.size(); j++) {
            if (wordsList.get(j).equals(wordsList.get(i))) {
                freq++;
            }
        }
        if (!words.containsKey(wordsList.get(i))) {
            words.put(wordsList.get(i), freq);
        }
    }
    return words;
}

为什么会这样?

您正在为 sportBag 和 testBag 使用相同的 bagOfWords 实例。由于您的 .Bag 方法从不清除地图,因此它会尝试使用现有值向地图添加值。

这里有 2 个选项:

  • 在 .Bag() 方法的开头清除地图。
  • 每次需要生成 HashMap 时都创建 bagOfWords 的新实例。