将字节数据转换为数据帧
Converting byte data to dataframe
我有以下数据:
{"links":[{"rel":"self","href":"https://api.pjm.com"},
{"rel":"next","href":"https://api.pjm.com"},{"rel":"metadata","href":"https://api.pjm.com/api/v1/ftr_cong_lmp/metadata"}],
"items":[{"effective_day":"2020-12-01T00:00:00","terminate_day":"2020-12-31T00:00:00","pnode_name":"02AMSTED138 KV TR2","offpeak_clmp":-0.290000,"onpeak_clmp":-0.240000,"24hour_clmp":-0.270000,"lt_sim_offpeak_clmp":-0.240000,"lt_sim_onpeak_clmp":-0.220000,"lt_sim_clmp":-0.240000},{"effective_day":"2020-12-01T00:00:00","terminate_day":"2020-12-31T00:00:00","pnode_name":"02AMSTED138 KV TR6","offpeak_clmp":-0.290000,"onpeak_clmp":-0.240000,"24hour_clmp":-0.270000,"lt_sim_offpeak_clmp":-0.240000,"lt_sim_onpeak_clmp":-0.220000,"lt_sim_clmp":-0.240000},{"effective_day":"2020-12-01T00:00:00","terminate_day":"2020-12-31T00:00:00","pnode_name":"02CPP_NH138 KV TR2","offpeak_clmp":0.010000,"onpeak_clmp":1.530000,"24hour_clmp":0.660000,"lt_sim_offpeak_clmp":0.010000,"lt_sim_onpeak_clmp":1.520000,"lt_sim_clmp":0.660000}],"searchSpecification":{"rowCount":25,"sort":"terminate_day","order":"Desc","startRow":1,"isActiveMetadata":true,"fields":["24hour_clmp","effective_day","lt_sim_clmp","lt_sim_offpeak_clmp","lt_sim_onpeak_clmp","offpeak_clmp","onpeak_clmp","pnode_name","terminate_day"],"filters":[{"effective_day":"2020-01-01T00:00:00.0000000 to 2020-12-31T23:59:59.0000000"}]},"totalRows":163378}'
我正在尝试将上述数据放入数据框中,因此我正在尝试以下操作:
from io import StringIO
s=str(bytes_data,'utf-8')
data = StringIO(s)
df=pd.read_csv(data)
但它给我的是空数据框,其中包含列中的全部数据。
编辑:
信息包含在这里:
{"effective_day":"2020-12-01T00:00:00","terminate_day":"2020-12-31T00:00:00","pnode_name":"02AMSTED138 KV TR2","offpeak_clmp":-0.290000,"onpeak_clmp":-0.240000,"24hour_clmp":-0.270000,"lt_sim_offpeak_clmp":-0.240000,"lt_sim_onpeak_clmp":-0.220000,"lt_sim_clmp":-0.240000}
即我试图将上面的内容放在一个 数据框中,其中的列作为上面字典的键 但是我如何从我的原始数据中只提取这些项目并将其放入数据框中。
您可以将字符串数据评估为字典并使用它来创建数据框:
pd.DataFrame(eval(s)['items'])
之前,您需要定义表达式中使用的 true
的值,例如通过 true = True
.
结果:
effective_day terminate_day ... lt_sim_onpeak_clmp lt_sim_clmp
0 2020-12-01T00:00:00 2020-12-31T00:00:00 ... -0.22 -0.24
1 2020-12-01T00:00:00 2020-12-31T00:00:00 ... -0.22 -0.24
2 2020-12-01T00:00:00 2020-12-31T00:00:00 ... 1.52 0.66
但是,出于安全原因,建议使用 ast.literal_eval
而不是 eval
。在这种情况下,true
的变量定义不起作用,因此您需要在字符串中手动替换它:
import ast
pd.DataFrame(ast.literal_eval(s.replace('true','True'))['items'])
我有以下数据:
{"links":[{"rel":"self","href":"https://api.pjm.com"},
{"rel":"next","href":"https://api.pjm.com"},{"rel":"metadata","href":"https://api.pjm.com/api/v1/ftr_cong_lmp/metadata"}],
"items":[{"effective_day":"2020-12-01T00:00:00","terminate_day":"2020-12-31T00:00:00","pnode_name":"02AMSTED138 KV TR2","offpeak_clmp":-0.290000,"onpeak_clmp":-0.240000,"24hour_clmp":-0.270000,"lt_sim_offpeak_clmp":-0.240000,"lt_sim_onpeak_clmp":-0.220000,"lt_sim_clmp":-0.240000},{"effective_day":"2020-12-01T00:00:00","terminate_day":"2020-12-31T00:00:00","pnode_name":"02AMSTED138 KV TR6","offpeak_clmp":-0.290000,"onpeak_clmp":-0.240000,"24hour_clmp":-0.270000,"lt_sim_offpeak_clmp":-0.240000,"lt_sim_onpeak_clmp":-0.220000,"lt_sim_clmp":-0.240000},{"effective_day":"2020-12-01T00:00:00","terminate_day":"2020-12-31T00:00:00","pnode_name":"02CPP_NH138 KV TR2","offpeak_clmp":0.010000,"onpeak_clmp":1.530000,"24hour_clmp":0.660000,"lt_sim_offpeak_clmp":0.010000,"lt_sim_onpeak_clmp":1.520000,"lt_sim_clmp":0.660000}],"searchSpecification":{"rowCount":25,"sort":"terminate_day","order":"Desc","startRow":1,"isActiveMetadata":true,"fields":["24hour_clmp","effective_day","lt_sim_clmp","lt_sim_offpeak_clmp","lt_sim_onpeak_clmp","offpeak_clmp","onpeak_clmp","pnode_name","terminate_day"],"filters":[{"effective_day":"2020-01-01T00:00:00.0000000 to 2020-12-31T23:59:59.0000000"}]},"totalRows":163378}'
我正在尝试将上述数据放入数据框中,因此我正在尝试以下操作:
from io import StringIO
s=str(bytes_data,'utf-8')
data = StringIO(s)
df=pd.read_csv(data)
但它给我的是空数据框,其中包含列中的全部数据。
编辑:
信息包含在这里:
{"effective_day":"2020-12-01T00:00:00","terminate_day":"2020-12-31T00:00:00","pnode_name":"02AMSTED138 KV TR2","offpeak_clmp":-0.290000,"onpeak_clmp":-0.240000,"24hour_clmp":-0.270000,"lt_sim_offpeak_clmp":-0.240000,"lt_sim_onpeak_clmp":-0.220000,"lt_sim_clmp":-0.240000}
即我试图将上面的内容放在一个 数据框中,其中的列作为上面字典的键 但是我如何从我的原始数据中只提取这些项目并将其放入数据框中。
您可以将字符串数据评估为字典并使用它来创建数据框:
pd.DataFrame(eval(s)['items'])
之前,您需要定义表达式中使用的 true
的值,例如通过 true = True
.
结果:
effective_day terminate_day ... lt_sim_onpeak_clmp lt_sim_clmp
0 2020-12-01T00:00:00 2020-12-31T00:00:00 ... -0.22 -0.24
1 2020-12-01T00:00:00 2020-12-31T00:00:00 ... -0.22 -0.24
2 2020-12-01T00:00:00 2020-12-31T00:00:00 ... 1.52 0.66
但是,出于安全原因,建议使用 ast.literal_eval
而不是 eval
。在这种情况下,true
的变量定义不起作用,因此您需要在字符串中手动替换它:
import ast
pd.DataFrame(ast.literal_eval(s.replace('true','True'))['items'])