Json 到 CSV 问题
Json to CSV issues
我正在使用 pandas 规范化一些 json 数据。当超过 1 个部分是对象或数组时,我遇到了这个问题。
如果我在 Car 上使用 record_path,它会在第二个时损坏。
关于如何获得类似的东西以在每辆车和每个位置的 csv 中创建一行的任何指示?
[
{
"Name": "John Doe",
"Car": [
"Car1",
"Car2"
],
"Location": "Texas"
},
{
"Name": "Jane Roe",
"Car": "Car1",
"Location": [
"Illinois",
"Kansas"
]
}
]
这是输出
Name,Car,Location
John Doe,"['Car1', 'Car2']",Texas
Jane Roe,Car1,"['Illinois', 'Kansas']"
代码如下:
with open('file.json') as data_file:
data = json.load(data_file)
df = pd.io.json.json_normalize(data, errors='ignore')
希望结局是这样的:
Name,Car,Location
John Doe,Car1,Texas
John Doe,Car2,Texas
Jane Roe,Car1,Illinois
Jane Roe,Car1,Kansas
在我 运行 进入包含额外数据的 json 文件之前,答案一直很有效。这是带有额外值的文件的样子。
{
Customers:[
{
"Name": "John Doe",
"Car": [
"Car1",
"Car2"
],
"Location": "Texas",
"Repairs: {
"RepairLocations": {
"RepairsCompleted":[
"Fix1",
"Fix2"
]
}
}
},
{
"Name": "Jane Roe",
"Car": "Car1",
"Location": [
"Illinois",
"Kansas"
]
}
]
}
这就是我想要的。我认为它是这种格式中最具可读性的,但至少所有的键都应该如此
Name,Car,Location,Repairs:RepairLocation
John Doe,Car1,Texas,RepairsCompleted:Fix1
John Doe,Car1,Texas,RepairsCompleted:Fix2
John Doe,Car2,Texas,RepairsCompleted:Fix1
John Doe,Car2,Texas,RepairsCompleted:Fix2
Jane Roe,Car1,Illinois,
Jane Roe,Car1,Kansas,
关于获得第二部分的任何建议?
您正在寻找这样的东西:
def expand($keys):
. as $in
| reduce $keys[] as $k ( [{}];
map(. + {
($k): ($in[$k] | if type == "array" then .[] else . end)
})
) | .[];
(.[0] | keys_unsorted) as $h
| $h, (.[] | expand($h) | [.[$h[]]]) | @csv
一个简单的 jq 解决方案,它也比这里需要的更通用:
["Name", "Car", "Location"],
(.[]
| [.Name] + (.Car|..|scalars|[.]) + (.Location|..|scalars|[.]))
| @csv
我正在使用 pandas 规范化一些 json 数据。当超过 1 个部分是对象或数组时,我遇到了这个问题。
如果我在 Car 上使用 record_path,它会在第二个时损坏。
关于如何获得类似的东西以在每辆车和每个位置的 csv 中创建一行的任何指示?
[
{
"Name": "John Doe",
"Car": [
"Car1",
"Car2"
],
"Location": "Texas"
},
{
"Name": "Jane Roe",
"Car": "Car1",
"Location": [
"Illinois",
"Kansas"
]
}
]
这是输出
Name,Car,Location
John Doe,"['Car1', 'Car2']",Texas
Jane Roe,Car1,"['Illinois', 'Kansas']"
代码如下:
with open('file.json') as data_file:
data = json.load(data_file)
df = pd.io.json.json_normalize(data, errors='ignore')
希望结局是这样的:
Name,Car,Location
John Doe,Car1,Texas
John Doe,Car2,Texas
Jane Roe,Car1,Illinois
Jane Roe,Car1,Kansas
在我 运行 进入包含额外数据的 json 文件之前,答案一直很有效。这是带有额外值的文件的样子。
{
Customers:[
{
"Name": "John Doe",
"Car": [
"Car1",
"Car2"
],
"Location": "Texas",
"Repairs: {
"RepairLocations": {
"RepairsCompleted":[
"Fix1",
"Fix2"
]
}
}
},
{
"Name": "Jane Roe",
"Car": "Car1",
"Location": [
"Illinois",
"Kansas"
]
}
]
}
这就是我想要的。我认为它是这种格式中最具可读性的,但至少所有的键都应该如此
Name,Car,Location,Repairs:RepairLocation
John Doe,Car1,Texas,RepairsCompleted:Fix1
John Doe,Car1,Texas,RepairsCompleted:Fix2
John Doe,Car2,Texas,RepairsCompleted:Fix1
John Doe,Car2,Texas,RepairsCompleted:Fix2
Jane Roe,Car1,Illinois,
Jane Roe,Car1,Kansas,
关于获得第二部分的任何建议?
您正在寻找这样的东西:
def expand($keys):
. as $in
| reduce $keys[] as $k ( [{}];
map(. + {
($k): ($in[$k] | if type == "array" then .[] else . end)
})
) | .[];
(.[0] | keys_unsorted) as $h
| $h, (.[] | expand($h) | [.[$h[]]]) | @csv
一个简单的 jq 解决方案,它也比这里需要的更通用:
["Name", "Car", "Location"],
(.[]
| [.Name] + (.Car|..|scalars|[.]) + (.Location|..|scalars|[.]))
| @csv