在 java 中生成两个 hashmap 时发生冲突
Conflicting happens when generating two hashmaps in java
我有两个文本,我进行标记化和删除停用词,然后进行词形还原,然后生成一个哈希图,其中包含词条及其在文本中的频率。
当我在一个文本上应用上述步骤时,一切正常,如下所示:
String train = "hamza was studying hamza studied yesterday";
String test = "hamza is swimming today";
sportBag = bagOfWords.Bag(lemmatize.Lemmatize(tokenizeAndRemoveStopWords.RemoveStopWords(train)));
//testBag = bagOfWords.Bag(lemmatize.Lemmatize(tokenizeAndRemoveStopWords.RemoveStopWords(test)));
System.out.println("hashmap for train");
for (String name : sportBag.keySet()) {
String key = name;
int value = sportBag.get(name);
System.out.println(key + " " + value);
}
System.out.println("hashmap for test");
for (String name : testBag.keySet()) {
String key = name;
int value = testBag.get(name);
System.out.println(key + " " + value);
}
并且输出符合预期
hashmap for train
yesterday 1
study 2
hamza 2
当我生成如下两个哈希图时出现问题:
String train = "hamza was studying hamza studied yesterday";
String test = "hamza is swimming today";
sportBag = bagOfWords.Bag(lemmatize.Lemmatize(tokenizeAndRemoveStopWords.RemoveStopWords(train)));
testBag = bagOfWords.Bag(lemmatize.Lemmatize(tokenizeAndRemoveStopWords.RemoveStopWords(test)));
System.out.println("hashmap for train");
for (String name : sportBag.keySet()) {
String key = name;
int value = sportBag.get(name);
System.out.println(key + " " + value);
}
System.out.println("hashmap for test");
for (String name : testBag.keySet()) {
String key = name;
int value = testBag.get(name);
System.out.println(key + " " + value);
}
这里出现了问题
hashmap for train
yesterday 1
swimming 1
study 2
today 1
hamza 2
hashmap for test
yesterday 1
swimming 1
study 2
today 1
hamza 2
这里是 Bag 方法:
public Map<String, Integer> words = new HashMap<>();
/**
* Constructor.
*
* @param wordsList
* @return
*/
public Map<String, Integer> Bag(List<String> wordsList) {
for (int i = 0; i < wordsList.size(); i++) {
int freq = 0;
for (int j = 0; j < wordsList.size(); j++) {
if (wordsList.get(j).equals(wordsList.get(i))) {
freq++;
}
}
if (!words.containsKey(wordsList.get(i))) {
words.put(wordsList.get(i), freq);
}
}
return words;
}
为什么会这样?
您正在为 sportBag 和 testBag 使用相同的 bagOfWords 实例。由于您的 .Bag 方法从不清除地图,因此它会尝试使用现有值向地图添加值。
这里有 2 个选项:
- 在 .Bag() 方法的开头清除地图。
- 每次需要生成 HashMap 时都创建 bagOfWords 的新实例。
我有两个文本,我进行标记化和删除停用词,然后进行词形还原,然后生成一个哈希图,其中包含词条及其在文本中的频率。
当我在一个文本上应用上述步骤时,一切正常,如下所示:
String train = "hamza was studying hamza studied yesterday";
String test = "hamza is swimming today";
sportBag = bagOfWords.Bag(lemmatize.Lemmatize(tokenizeAndRemoveStopWords.RemoveStopWords(train)));
//testBag = bagOfWords.Bag(lemmatize.Lemmatize(tokenizeAndRemoveStopWords.RemoveStopWords(test)));
System.out.println("hashmap for train");
for (String name : sportBag.keySet()) {
String key = name;
int value = sportBag.get(name);
System.out.println(key + " " + value);
}
System.out.println("hashmap for test");
for (String name : testBag.keySet()) {
String key = name;
int value = testBag.get(name);
System.out.println(key + " " + value);
}
并且输出符合预期
hashmap for train
yesterday 1
study 2
hamza 2
当我生成如下两个哈希图时出现问题:
String train = "hamza was studying hamza studied yesterday";
String test = "hamza is swimming today";
sportBag = bagOfWords.Bag(lemmatize.Lemmatize(tokenizeAndRemoveStopWords.RemoveStopWords(train)));
testBag = bagOfWords.Bag(lemmatize.Lemmatize(tokenizeAndRemoveStopWords.RemoveStopWords(test)));
System.out.println("hashmap for train");
for (String name : sportBag.keySet()) {
String key = name;
int value = sportBag.get(name);
System.out.println(key + " " + value);
}
System.out.println("hashmap for test");
for (String name : testBag.keySet()) {
String key = name;
int value = testBag.get(name);
System.out.println(key + " " + value);
}
这里出现了问题
hashmap for train
yesterday 1
swimming 1
study 2
today 1
hamza 2
hashmap for test
yesterday 1
swimming 1
study 2
today 1
hamza 2
这里是 Bag 方法:
public Map<String, Integer> words = new HashMap<>();
/**
* Constructor.
*
* @param wordsList
* @return
*/
public Map<String, Integer> Bag(List<String> wordsList) {
for (int i = 0; i < wordsList.size(); i++) {
int freq = 0;
for (int j = 0; j < wordsList.size(); j++) {
if (wordsList.get(j).equals(wordsList.get(i))) {
freq++;
}
}
if (!words.containsKey(wordsList.get(i))) {
words.put(wordsList.get(i), freq);
}
}
return words;
}
为什么会这样?
您正在为 sportBag 和 testBag 使用相同的 bagOfWords 实例。由于您的 .Bag 方法从不清除地图,因此它会尝试使用现有值向地图添加值。
这里有 2 个选项:
- 在 .Bag() 方法的开头清除地图。
- 每次需要生成 HashMap 时都创建 bagOfWords 的新实例。