For 循环打印所有元素,但是当结果保存在 pandas 数据帧中时它 returns NaN
Foor loop prints all elements, but when result is saved in pandas dataframe it returns NaN
我有以下抓取脚本,我需要在“items2”foor 循环中获取元素。
该脚本正在打印所有元素,但稍后在数据框 returns “name” 和 “tPlan” 上打印为 NaN。知道为什么吗?
import requests
import json
import csv
import sys
from bs4 import BeautifulSoup
base_url = "xxxx"
username = "xxxx"
password = "xxxx"
toget = data
allowed_results = 50
max_results = "maxResults=" + str(allowed_results)
tc = "/testcycles?"
result_count = -1
start_index = 0
df = pd.DataFrame(
columns=['id', 'name', 'gId', 'dKey', 'tPlan'])
for eachId in toget['TPlan_ID']:
while result_count != 0:
start_at = "startAt=" + str(start_index)
url = url = f'{base_url}{eachId}{tc}&{start_at}&{max_results}'
response = requests.get(url, auth=(username, password))
json_response = json.loads(response.text)
print(json_response)
page_info = json_response["meta"]["pageInfo"]
start_index = page_info["startIndex"] + allowed_results
result_count = page_info["resultCount"]
items2 = json_response["data"]
print(items2)
for item in items2:
print (item["id"])
print (item["fields"]["name"])
print (item["fields"]["gId"])
print (item["fields"]["dKey"])
print (item["fields"]["tPlan"])
temporary_df = pd.DataFrame([item], columns=['id', 'name', 'gId', 'dKey', 'tPlan'])
df = df.append(temporary_df, ignore_index=True)
TLDR
使用这个 for 循环。
for item in items2:
df = df.append({'id': item['id'], **item['fields']}, ignore_index=True)
说明
我假设 items2
看起来像这样。
items2 = [
{ 'id': 0, 'fields': {'name': 'prop1', 'gId': 100, 'dKey': 'key1', 'tPlan': 'plan1'}},
{ 'id': 1, 'fields': {'name': 'prop2', 'gId': 200, 'dKey': 'key2', 'tPlan': 'plan2'}},
{ 'id': 2, 'fields': {'name': 'prop3', 'gId': 300, 'dKey': 'key3', 'tPlan': 'plan3'}},
]
您无法创建您想要的数据框,因为 item
的结构是这样的。
{'id': 2, 'fields': {'name': 'prop3', 'gId': 300, 'dKey': 'key3', 'tPlan': 'plan3'}}
这导致 temporary_df
填充了 NaN。
id name gId dKey tPlan fields
0 0 NaN NaN NaN NaN key1
1 0 NaN NaN NaN NaN 100
2 0 NaN NaN NaN NaN prop1
3 0 NaN NaN NaN NaN plan1
4 1 NaN NaN NaN NaN key2
5 1 NaN NaN NaN NaN 200
6 1 NaN NaN NaN NaN prop2
7 1 NaN NaN NaN NaN plan2
8 2 NaN NaN NaN NaN key3
9 2 NaN NaN NaN NaN 300
10 2 NaN NaN NaN NaN prop3
11 2 NaN NaN NaN NaN plan3
您需要作为参数传递给 pd.DataFrame 的是像
这样的字典结构
{'id': 2, 'name': 'prop3', 'gId': 300, 'dKey': 'key3', 'tPlan': 'plan3'}
注意这里缺少的 fields
字典,来自 fields
的所有键值对都添加到 item
。使用这个修改后的字典会导致 temporary_df
like
id name gId dKey tPlan
0 0 prop1 100 key1 plan1
1 1 prop2 200 key2 plan2
2 2 prop3 300 key3 plan3
要在项目结构中进行此更改,您应该这样做
new_item = {'id': item['id']}
for key, value in item['fields'].items():
new_item[key] = value
但是你可以使用 unpacking operator **
new_item = {'id': item['id'], **item['fields']}
现在我们可以使用传递 new_item
作为 pd.DataFrame
.
的参数
temp_df = pd.DataFrame({ 'id': item['id'], **item['fields']}, index=(i,)) # i here is the row index of the DataFrame
进行这些更改后,您的 for 循环应该如下所示
for i, item in enumerate(items2):
new_item = {'id': item['id'], **item['fields']}
temp_df = pd.DataFrame(new_item, index=(i,))
df = df.append(temp_df, ignore_index=True)
我们可以通过直接将 new_item
传递给 pd.DataFrame.append
来使其更加简洁
因此最终这段代码应该可以工作。
for item in items2:
new_item = {'id': item['id'], **item['fields']}
df = df.append(new_item, ignore_index=True)
我有以下抓取脚本,我需要在“items2”foor 循环中获取元素。 该脚本正在打印所有元素,但稍后在数据框 returns “name” 和 “tPlan” 上打印为 NaN。知道为什么吗?
import requests
import json
import csv
import sys
from bs4 import BeautifulSoup
base_url = "xxxx"
username = "xxxx"
password = "xxxx"
toget = data
allowed_results = 50
max_results = "maxResults=" + str(allowed_results)
tc = "/testcycles?"
result_count = -1
start_index = 0
df = pd.DataFrame(
columns=['id', 'name', 'gId', 'dKey', 'tPlan'])
for eachId in toget['TPlan_ID']:
while result_count != 0:
start_at = "startAt=" + str(start_index)
url = url = f'{base_url}{eachId}{tc}&{start_at}&{max_results}'
response = requests.get(url, auth=(username, password))
json_response = json.loads(response.text)
print(json_response)
page_info = json_response["meta"]["pageInfo"]
start_index = page_info["startIndex"] + allowed_results
result_count = page_info["resultCount"]
items2 = json_response["data"]
print(items2)
for item in items2:
print (item["id"])
print (item["fields"]["name"])
print (item["fields"]["gId"])
print (item["fields"]["dKey"])
print (item["fields"]["tPlan"])
temporary_df = pd.DataFrame([item], columns=['id', 'name', 'gId', 'dKey', 'tPlan'])
df = df.append(temporary_df, ignore_index=True)
TLDR
使用这个 for 循环。
for item in items2:
df = df.append({'id': item['id'], **item['fields']}, ignore_index=True)
说明
我假设 items2
看起来像这样。
items2 = [
{ 'id': 0, 'fields': {'name': 'prop1', 'gId': 100, 'dKey': 'key1', 'tPlan': 'plan1'}},
{ 'id': 1, 'fields': {'name': 'prop2', 'gId': 200, 'dKey': 'key2', 'tPlan': 'plan2'}},
{ 'id': 2, 'fields': {'name': 'prop3', 'gId': 300, 'dKey': 'key3', 'tPlan': 'plan3'}},
]
您无法创建您想要的数据框,因为 item
的结构是这样的。
{'id': 2, 'fields': {'name': 'prop3', 'gId': 300, 'dKey': 'key3', 'tPlan': 'plan3'}}
这导致 temporary_df
填充了 NaN。
id name gId dKey tPlan fields
0 0 NaN NaN NaN NaN key1
1 0 NaN NaN NaN NaN 100
2 0 NaN NaN NaN NaN prop1
3 0 NaN NaN NaN NaN plan1
4 1 NaN NaN NaN NaN key2
5 1 NaN NaN NaN NaN 200
6 1 NaN NaN NaN NaN prop2
7 1 NaN NaN NaN NaN plan2
8 2 NaN NaN NaN NaN key3
9 2 NaN NaN NaN NaN 300
10 2 NaN NaN NaN NaN prop3
11 2 NaN NaN NaN NaN plan3
您需要作为参数传递给 pd.DataFrame 的是像
这样的字典结构{'id': 2, 'name': 'prop3', 'gId': 300, 'dKey': 'key3', 'tPlan': 'plan3'}
注意这里缺少的 fields
字典,来自 fields
的所有键值对都添加到 item
。使用这个修改后的字典会导致 temporary_df
like
id name gId dKey tPlan
0 0 prop1 100 key1 plan1
1 1 prop2 200 key2 plan2
2 2 prop3 300 key3 plan3
要在项目结构中进行此更改,您应该这样做
new_item = {'id': item['id']}
for key, value in item['fields'].items():
new_item[key] = value
但是你可以使用 unpacking operator **
new_item = {'id': item['id'], **item['fields']}
现在我们可以使用传递 new_item
作为 pd.DataFrame
.
temp_df = pd.DataFrame({ 'id': item['id'], **item['fields']}, index=(i,)) # i here is the row index of the DataFrame
进行这些更改后,您的 for 循环应该如下所示
for i, item in enumerate(items2):
new_item = {'id': item['id'], **item['fields']}
temp_df = pd.DataFrame(new_item, index=(i,))
df = df.append(temp_df, ignore_index=True)
我们可以通过直接将 new_item
传递给 pd.DataFrame.append
因此最终这段代码应该可以工作。
for item in items2:
new_item = {'id': item['id'], **item['fields']}
df = df.append(new_item, ignore_index=True)