数据中的模式匹配和创建满足 python 中模式条件的 csv

Question

我正在处理如下图所示的数据集。

我已经使用 pandas 在 Python 中导入了 CSV 格式的数据集。我希望将整个数据与 PATR 列中具有 "a;b;c"、"lp;kl;jj" 等值的所有列（即，其中包含分号的数据的值）分隔为 CSV 和其他值像 ”;”和“250”到一个单独的 csv。我试过根据分号拆分值并根据长度分隔值，但我没有得到完全匹配。

实际数据集：

预期输出 1（所有数据的 PATR 列 "ANY_DATA and a semicolon"）

预期输出 2（所有数据的 PATR 列 "only semi colon or only data"）

提前致谢。

Answer 1

mask = df['PATR'].str.contains(';^\w+(;\w+)*$', na=False)
df1 = df[mask]
df2 = df[~mask]

这将适用于您的测试数据。我从 here.

中提取了正则表达式

Answer 2

根据您的要求试试这个：

 def pattern_matcher(y) :
    if y.count(';')<1 or y ==';':
        #case of the string doesn't contain any ';'
        return True
    else :
        """
        this will return True if it contain only ';' without any empty word preceding it , using 
        strip to check if it is only ';'
        """
        return False #all([not x.strip() for x in y.split(";")])

然后将其应用于您的数据框列

out2 = df2.loc[(df2.part.apply(pattern_matcher))]


part
2   ;
3   250

和

out1 = df2.loc[~(df2.part.apply(pattern_matcher))]

    part
0   A;B;C
1   ip;KL;JH

数据中的模式匹配和创建满足 python 中模式条件的 csv

pattern matching in data and creating csv's which satisfy the pattern condition in python

python

csv

python-3.x

pandas

data-cleaning