使用函数时如何在 pandas 中使用异常处理
How to use exception handling in pandas while using a function
我有以下数据框:
a b x y language
0 id1 id_2 3 text1
1 id2 id_4 6 text2
2 id3 id_6 9 text3
3 id4 id_8 12 text4
我正在尝试使用 langdetect 来检测 y 列中文本元素的语言。
这是我为此目的使用的代码:
for i,row in df.iterrows():
df.loc[i].at["language"] = detect(df.loc[i].at["y"])
不幸的是,此列中涉及非文本元素(包括空格、符号、数字及其组合),因此我得到以下回溯:
LangDetectException Traceback (most recent call last)
<ipython-input-40-3b2637554e5f> in <module>
1 df["language"]=""
2 for i,row in df.iterrows():
----> 3 df.loc[i].at["language"] = detect(df.loc[i].at["y"])
4 df.head()
C:\Anaconda\lib\site-packages\langdetect\detector_factory.py in detect(text)
128 detector = _factory.create()
129 detector.append(text)
--> 130 return detector.detect()
131
132
C:\Anaconda\lib\site-packages\langdetect\detector.py in detect(self)
134 which has the highest probability.
135 '''
--> 136 probabilities = self.get_probabilities()
137 if probabilities:
138 return probabilities[0].lang
C:\Anaconda\lib\site-packages\langdetect\detector.py in get_probabilities(self)
141 def get_probabilities(self):
142 if self.langprob is None:
--> 143 self._detect_block()
144 return self._sort_probability(self.langprob)
145
C:\Anaconda\lib\site-packages\langdetect\detector.py in _detect_block(self)
148 ngrams = self._extract_ngrams()
149 if not ngrams:
--> 150 raise LangDetectException(ErrorCode.CantDetectError, 'No features in text.')
151
152 self.langprob = [0.0] * len(self.langlist)
LangDetectException: No features in text.
有没有一种方法可以使用异常处理,以便 detect 库中的 detect 函数可用于那些适当的文本元素?
因此,给定以下数据框:
import pandas as pd
df = pd.DataFrame(
{
"a": {0: "id1", 1: "id2", 2: "id3", 3: "id4"},
"b": {0: "id_2", 1: "id_4", 2: "id_6", 3: "id_8"},
"x": {0: 3, 1: 6, 2: 9, 3: 12},
"y": {0: "text1", 1: "text2", 2: "text3", 3: "text4"},
"language": {0: "", 1: "", 2: "", 3: ""},
}
)
并且,为了回答的目的,这些模拟异常和函数:
class LangDetectException(Exception):
pass
def detect(x):
if x == "text2":
raise LangDetectException
else:
return "english"
您可以跳过“y”包含非文本元素的行(此处为第 1 行),如下所示:
for i, row in df.iterrows():
try:
df.loc[i, "language"] = detect(row["y"])
except LangDetectException:
continue
等等:
print(df)
# Outputs
a b x y language
0 id1 id_2 3 text1 english
1 id2 id_4 6 text2
2 id3 id_6 9 text3 english
3 id4 id_8 12 text4 english
我有以下数据框:
a b x y language
0 id1 id_2 3 text1
1 id2 id_4 6 text2
2 id3 id_6 9 text3
3 id4 id_8 12 text4
我正在尝试使用 langdetect 来检测 y 列中文本元素的语言。
这是我为此目的使用的代码:
for i,row in df.iterrows():
df.loc[i].at["language"] = detect(df.loc[i].at["y"])
不幸的是,此列中涉及非文本元素(包括空格、符号、数字及其组合),因此我得到以下回溯:
LangDetectException Traceback (most recent call last)
<ipython-input-40-3b2637554e5f> in <module>
1 df["language"]=""
2 for i,row in df.iterrows():
----> 3 df.loc[i].at["language"] = detect(df.loc[i].at["y"])
4 df.head()
C:\Anaconda\lib\site-packages\langdetect\detector_factory.py in detect(text)
128 detector = _factory.create()
129 detector.append(text)
--> 130 return detector.detect()
131
132
C:\Anaconda\lib\site-packages\langdetect\detector.py in detect(self)
134 which has the highest probability.
135 '''
--> 136 probabilities = self.get_probabilities()
137 if probabilities:
138 return probabilities[0].lang
C:\Anaconda\lib\site-packages\langdetect\detector.py in get_probabilities(self)
141 def get_probabilities(self):
142 if self.langprob is None:
--> 143 self._detect_block()
144 return self._sort_probability(self.langprob)
145
C:\Anaconda\lib\site-packages\langdetect\detector.py in _detect_block(self)
148 ngrams = self._extract_ngrams()
149 if not ngrams:
--> 150 raise LangDetectException(ErrorCode.CantDetectError, 'No features in text.')
151
152 self.langprob = [0.0] * len(self.langlist)
LangDetectException: No features in text.
有没有一种方法可以使用异常处理,以便 detect 库中的 detect 函数可用于那些适当的文本元素?
因此,给定以下数据框:
import pandas as pd
df = pd.DataFrame(
{
"a": {0: "id1", 1: "id2", 2: "id3", 3: "id4"},
"b": {0: "id_2", 1: "id_4", 2: "id_6", 3: "id_8"},
"x": {0: 3, 1: 6, 2: 9, 3: 12},
"y": {0: "text1", 1: "text2", 2: "text3", 3: "text4"},
"language": {0: "", 1: "", 2: "", 3: ""},
}
)
并且,为了回答的目的,这些模拟异常和函数:
class LangDetectException(Exception):
pass
def detect(x):
if x == "text2":
raise LangDetectException
else:
return "english"
您可以跳过“y”包含非文本元素的行(此处为第 1 行),如下所示:
for i, row in df.iterrows():
try:
df.loc[i, "language"] = detect(row["y"])
except LangDetectException:
continue
等等:
print(df)
# Outputs
a b x y language
0 id1 id_2 3 text1 english
1 id2 id_4 6 text2
2 id3 id_6 9 text3 english
3 id4 id_8 12 text4 english