在 python 中使用 pandas 从列中提取特定字符串

Question

我有一个 CSV 文件，其中包含如下列：

"Date","Time","TimeZone","Name","Type","Status","Currency","Gross","Fee","Net","From Email Address","To Email Address","Transaction ID","Shipping Address","Address Status","Item Title","Item ID","Shipping and Handling Amount","Insurance Amount","Sales Tax","Option 1 Name","Option 1 Value","Option 2 Name","Option 2 Value","Reference Txn ID","Invoice Number","Custom Number","Quantity","Receipt ID","Balance","Address Line 1","Address Line 2/District/Neighborhood","Town/City","State/Province/Region/County/Territory/Prefecture/Republic","Zip/Postal Code","Country","Contact Phone Number","Subject","Note","Country Code","Balance Impact"

我试图只抓取 Item Title 列中包含字符串 Chain × Jewelry × Necklace 的数据行。

每个项目标题下的名称不同。例如。一个可能是 Chain × Jewelry × Necklace Popcorn Necklace 其他是 BLANK VALUES 但我只想要所有包含 Chain × Jewelry × Necklace

如何使用 pandas 提取包含此字符串的这些特定行？我有麻烦了。非常感谢任何帮助。

Answer 1

您可以使用正则表达式：

df[df["Item Title"].str.contains(r"^(?=.*\bChain\b)(?=.*\bJewelry\b)(?=.*\bNecklace\b).+", regex=True)]

Answer 2

试试这个：

df = pd.read_csv('path/to/your/file.csv')
df = df[df['Item Title'].fillna('').str.contains('Chain × Jewelry × Necklace') & df['Name'].fillna('').str.len().gt(0)]

在 python 中使用 pandas 从列中提取特定字符串

Using pandas in python to pull a specific string from a column

python

pandas

data-science