根据在其他列行中找到的内容，使用循环填充 python 中新列的行

Question

我正在寻求有关填充数据框中新列内容的帮助。我需要 New_column 根据其他列中的内容进行填充。

import pandas as pd

df = pd.read_csv('sample.txt')
# the data is imported as one column
df.columns = ['Test']
# split into columns
dfnew = df.Test.str.split(expand=True).rename(columns={0:'Datetime', 1:'P1', 2:'P2'})

# create a new column
dfnew["New_column"] = ""
print(dfnew)

                Datetime     P1           P2        New_column
8             'Name-1'      None         None          
9    2017-01-01T00:00:00    2800         1600          
10   2017-02-01T00:00:00  -99999         2375            
..                   ...     ...          ...       ...
72            'Name-2'      None         None         
73   2018-10-11T00:00:00     0           2000          
74   2018-10-18T00:00:00     0           2000                  
..                   ...     ...          ...       ...
[724 rows x 4 columns]

在.txt文件中，当Datetime列中有Name-#值时，P1和P2行为空白，但打印df时，空白被替换为'None'。每行 x 行，日期时间列中的名称-# 就会更改（与名称相关联的数字不会按任何顺序增加）。我希望 New_column 使用在 Datetime 列中找到的 Name-# 填充每一行，直到下一个 Name-# 值替换它：

                Datetime     P1           P2        New_column
8             'Name-1'       None         None          
9    2017-01-01T00:00:00     2800         1600          Name-1
10   2017-02-01T00:00:00   -99999         2375          Name-1
..                   ...     ...          ...       ...
72            'Name-2'       None         None         
73   2020-10-11T00:00:00      0           2000          Name-2
74   2020-10-18T00:00:00      0           2000          Name-2       
..                   ...     ...          ...       ...
623           'Name-14'      None         None         
624  2020-04-21T00:00:00   -99999         730           Name-14
625  2020-04-27T00:00:00      0           260           Name-14
..                   ...     ...          ...       ...
[724 rows x 4 columns]

我还想删除日期时间列中具有名称-# 的行（即第 8、72,623 等行）。我需要这个过程自动化，这样我就可以导入相同样式但不一定具有相同大小或相同 Name-# 值的 .txt 文件。我曾尝试使用带有多个 if 语句的 for 循环创建一个列表，然后将 New_column 分配给该列表，但我似乎无法让它工作..

我是 Python 的初学者，非常感谢任何帮助。

Answer 1

试试下面的代码。首先我们使用 Datetime 列创建一个新列。如果列值包含 'Name'，则 'new_col' 的值将是 DateTime 列的值，否则它将是 np.NaN（相当于 NULL）。

然后我们使用 ffill() 函数前向填充 new_col 如果值为 np.nan。

import numpy as np
dfnew['new_col']=[x if 'Name' in str(x) else np.nan for x in  dfnew.Datetime.values ]
dfnew['new_col']=dfnew['new_col'].ffill()

[ffill()][1]

[1]: https://www.geeksforgeeks.org/python-pandas-dataframe-ffill/#:~:text=ffill()%20function%20is%20used,propagate%20last%20valid%20observation%20forward.&text=inplace%20%3A%20If%20True%2C%20fill%20in,a%20column%20in%20a%20DataFrame).

根据在其他列行中找到的内容，使用循环填充 python 中新列的行

Using a loop to populate rows of a new column in python based on content found in other column rows

python

validation

loops

multiple-columns

dataframe