迭代一个数据帧并在没有 "append" 或 "concat" 的情况下将值添加到另一个数据帧?
Iterate over one dataframe and add values to another dataframe without "append" or "concat"?
我有一个数据框 "df_edges",我想在其中进行迭代。
迭代内部是一个 if/else 和一个字符串拆分。我需要将 if/else 中的值添加到一个新的数据框中(每次迭代 = 另一个数据框中的一个新行)。
"df_edges"的示例数据:
+-----------------------------------------+
| channelId ... featuredChannelsUrlsCount |
+-----------------------------------------+
| 0 UC-ry8ngUIJHTMBWeoARZGmA ... 1 |
| 1 UC-zK3cJdazy01AKTu8g_amg ... 6 |
| 2 UC05_iIGvXue0sR01JNpRHzw ... 10 |
| 3 UC141nSav5cjmTXN7B70ts0g ... 0 |
| 4 UC1cQzKmbx9x0KipvoCt4NJg ... 0 |
+----------------------------------------+
# new empty dataframe where I want to add the values
df_edges_to_db = pd.DataFrame(columns=["Source", "Target"])
#iteration over the dataframe
for row in df_edges.itertuples():
if row.featuredChannelsUrlsCount != 0:
featured_channels = row[2].split(',')
for fc in featured_channels:
writer.writerow([row[1], fc])
df_edges_to_db = df_edges_to_db.append({"Source": row[1], "Target": fc}, ignore_index=True)
else:
writer.writerow([row[1], row[1]])
df_edges_to_db = df_edges_to_db.append({"Source": row[1], "Target": row[1]}, ignore_index=True)
这似乎有效。但是文档说 (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html):
The following, while not recommended methods for generating DataFrames
那么,除了 append/concat 之外,还有更多 "best practice" 方法来添加具有值的行吗?
这里可以通过 python append
创建字典列表,而不是像您的解决方案中那样 DataFrame.append
,并且只调用一次 DataFrame
构造函数:
L = []
#iteration over the dataframe
for row in df_edges.itertuples():
if row.featuredChannelsUrlsCount != 0:
featured_channels = row[2].split(',')
for fc in featured_channels:
writer.writerow([row[1], fc])
L.append({"Source": row[1], "Target": fc})
else:
writer.writerow([row[1], row[1]])
L.append({"Source": row[1], "Target": row[1]})
df_edges_to_db = pd.DataFrame(L)
其实我不清楚你的 df_edges dataFrame 长什么样子。通过查看你的代码,我会建议你用这样的东西替换你的外部 for 循环体:
new_list= [someOperationOn(x) if x==0 else otherOperationOn(x) for x in mylist]
我有一个数据框 "df_edges",我想在其中进行迭代。 迭代内部是一个 if/else 和一个字符串拆分。我需要将 if/else 中的值添加到一个新的数据框中(每次迭代 = 另一个数据框中的一个新行)。
"df_edges"的示例数据:
+-----------------------------------------+
| channelId ... featuredChannelsUrlsCount |
+-----------------------------------------+
| 0 UC-ry8ngUIJHTMBWeoARZGmA ... 1 |
| 1 UC-zK3cJdazy01AKTu8g_amg ... 6 |
| 2 UC05_iIGvXue0sR01JNpRHzw ... 10 |
| 3 UC141nSav5cjmTXN7B70ts0g ... 0 |
| 4 UC1cQzKmbx9x0KipvoCt4NJg ... 0 |
+----------------------------------------+
# new empty dataframe where I want to add the values
df_edges_to_db = pd.DataFrame(columns=["Source", "Target"])
#iteration over the dataframe
for row in df_edges.itertuples():
if row.featuredChannelsUrlsCount != 0:
featured_channels = row[2].split(',')
for fc in featured_channels:
writer.writerow([row[1], fc])
df_edges_to_db = df_edges_to_db.append({"Source": row[1], "Target": fc}, ignore_index=True)
else:
writer.writerow([row[1], row[1]])
df_edges_to_db = df_edges_to_db.append({"Source": row[1], "Target": row[1]}, ignore_index=True)
这似乎有效。但是文档说 (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html):
The following, while not recommended methods for generating DataFrames
那么,除了 append/concat 之外,还有更多 "best practice" 方法来添加具有值的行吗?
这里可以通过 python append
创建字典列表,而不是像您的解决方案中那样 DataFrame.append
,并且只调用一次 DataFrame
构造函数:
L = []
#iteration over the dataframe
for row in df_edges.itertuples():
if row.featuredChannelsUrlsCount != 0:
featured_channels = row[2].split(',')
for fc in featured_channels:
writer.writerow([row[1], fc])
L.append({"Source": row[1], "Target": fc})
else:
writer.writerow([row[1], row[1]])
L.append({"Source": row[1], "Target": row[1]})
df_edges_to_db = pd.DataFrame(L)
其实我不清楚你的 df_edges dataFrame 长什么样子。通过查看你的代码,我会建议你用这样的东西替换你的外部 for 循环体:
new_list= [someOperationOn(x) if x==0 else otherOperationOn(x) for x in mylist]