如何修改一列中的多个值，但跳过 pandas python 中的其他值

Question

在 python 待两个月，我现在正努力专注于 Pandas。在我目前的职位上，我在数据框上使用 VBA，所以学习这个可以慢慢取代它并进一步发展我的职业生涯。截至目前，我认为我真正的问题是缺乏对关键概念的理解。任何帮助将不胜感激。

这就是我的问题：

我在哪里可以了解更多关于如何做这样的事情以进行更精确的过滤。我非常接近，但我需要一个关键方面。

目标

主要目标我需要跳过 ID 列中的某些值。 下面的代码去掉破折号“-”，最多只读取 9 位数字。然而，我需要跳过某些 ID，因为它们是唯一的。

之后我将开始比较多张纸。

主数据帧 ID 的格式为 000-000-000-000
我将比较有无的其他数据框破折号“-”为 000000000 减去三个 000 共九位数字。

我需要跳过的唯一 ID 在两个数据框中相同，但格式完全不同，范围为 000-000-000_#12、000-000-000_35 或 000-000-000_z。

我将在除唯一 ID 之外的每个 ID 上使用的代码：

 dfSS["ID"] = dfSS["ID"].str.replace("-", "").str[:9]

但我想使用 if 语句（这不起作用）

lst = ["000-000-000_#69B", "000-000-000_a", "etc.. random IDs", ]

if ~dfSS["ID"].isin(lst ).any()
    dfSS["ID"] = dfSS["ID"].str.replace("-", "").str[:9]
else:
    pass

为了更清楚地说明我的输入 DataFrame 是这样的：

            ID               Street #   Street Name 
0   004-330-002-000         2272        Narnia  
1   021-521-410-000_128     2311        Narnia  
2   001-243-313-000         2235        Narnia  
3   002-730-032-000         2149        Narnia
4   000-000-000_a           1234        Narnia

我希望将此作为输出：

            ID               Street #   Street Name 
0   004330002               2272        Narnia  
1   021-521-410-000_128     2311        Narnia  
2   001243313000            2235        Narnia  
3   002730032000            2149        Narnia
4   000-000-000_a           1234        Narnia

备注：

dfSS 是我的 Dataframe 变量名，也就是我正在使用的 excel。 “身份证”是我的专栏标题。事后将其设为索引
我在这项工作中的数据框很小，（行，列）的数量为（2500、125）
我没有收到错误消息，所以我猜我可能需要某种循环。也开始用这个测试循环。没有运气……还没有。

这是我研究这个的地方：

How to filter Pandas dataframe using 'in' and 'not in' like in SQL
if statement with ~isin() in pandas
recordlinkage module-I didn't think this was going to work
Regular expression operations - Having a hard time fully understanding this at the moment

Answer 1

有多种方法可以做到这一点。这里的第一种方式不涉及编写函数。

# Create a placeholder column with all transformed IDs
dfSS["ID_trans"] = dfSS["ID"].str.replace("-", "").str[:9]
dfSS.loc[~dfSS["ID"].isin(lst), "ID"] = dfSS.loc[~dfSS["ID"].isin(lst), "ID_trans"] # conditional indexing

第二种方法是写一个有条件转换ID的函数，速度不如第一种方法

def transform_ID(ID_val):
    if ID_val not in lst:
        return ID_val.replace("-", "")[:9]

dfSS['ID_trans'] = dfSS['ID'].apply(transform_ID)

Answer 2

这是基于@xyzxyzjayne 的回答，但我有两个问题无法解决。

第一期

我是否收到此警告：（请参阅编辑）

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

Documentation for this warning

您会在下面的代码中看到我试图放入 .loc，但我似乎无法找到如何通过正确使用 .loc 来消除此警告。还在学习中。不，即使它有效，我也不会忽略它。这是我说的学习机会。

第二期

是我没看懂这部分代码。我知道左边应该是行，右边是列。那就是为什么这行得通？当此代码为符文时，ID 是一列而不是一行。我制作 ID :

df.loc[~df["ID "].isin(uniqueID ), "ID "] = df.loc[~df["ID "].isin(uniqueID ), "Place Holder"]

我还不明白的地方，就是这部分逗号(,)的左边：

df.loc[~df["ID "].isin(uniqueID), "ID "]

这里说的是最终结果，基本上就像我说的那样是 XZY 的帮助让我来到这里，但我正在添加更多 .loc 并使用文档直到我可以消除警告。

    uniqueID = [ and whole list of IDs i had to manually enter 1000+ entries that
 will go in the below code. These ids get skipped. example: "032-234-987_#4256"]

# gets the columns i need to make the DateFrame smaller
df = df[['ID ', 'Street #', 'Street Name', 'Debris Finish', 'Number of Vehicles',
         'Number of Vehicles Removed', 'County']]

#Place holder will make our new column with this filter
df.loc[:, "Place Holder"] = df.loc[:,"ID "].str.replace("-", "").str[:9]

#the next code is the filter that goes through the list and skips them. Work in progress to fully understand.
df.loc[~df["ID "].isin(uniqueID ), "ID "] = df.loc[~df["ID "].isin(uniqueID ), "Place Holder"]

#Makes the ID our index
df = df.set_index("ID ")

#just here to add the date to our file name. Must import time for this to work
todaysDate = time.strftime("%m-%d-%y")

#make it an excel file
df.to_excel("ID TEXT " + todaysDate + ".xlsx")

一旦我摆脱警告并弄清楚左侧，我将编辑它，这样我就可以向 needs/sees 这个 post.

的每个人解释

编辑：SettingWithCopyWarning：

通过在过滤器之前制作原始数据库的副本并制作所有内容 .loc 解决了这个链式索引问题，因为 XYZ 帮助了我。在我们开始过滤之前使用 DataFrame.copy() ，其中 DataFrame 是您自己的数据框的名称。

如何修改一列中的多个值，但跳过 pandas python 中的其他值

How to modify multiple values in one column, but skip others in pandas python

python

filtering

pandas

isin

python-re

目标

备注：

第一期

第二期

编辑：SettingWithCopyWarning：