R: Web scraping JSON, 从嵌套中提取信息

Question

我正在尝试使用 tidyJSON 从 JSON 中提取信息，但我愿意接受任何可以实现我的目的的 R 包。我查看了文档和小插曲，发现 complex example 很有帮助。但是，我想要的信息嵌套在一个非键值对中，我不确定如何访问它。我有兴趣获取 appid、name、developer 等，但此信息在 570 和 730 内：

{"570":{"appid":570,"name":"Dota 2","developer":"Valve","publisher":"Valve","score_rank":71,"owners":102151578,"owners_variance":259003,"players_forever":102151578,"players_forever_variance":259003,"players_2weeks":9436299,"players_2weeks_variance":89979,"average_forever":11727,"average_2weeks":1229,"median_forever":277,"median_2weeks":662,"ccu":811259,"price":"0","tags":{"Free to Play":22678,"MOBA":7808,"Strategy":7415,"Multiplayer":6757,"Team-Based":4848,"Action":4602,"e-sports":4089,"Online Co-Op":3669,"Competitive":3553,"PvP":2655,"RTS":2267,"Difficult":2129,"RPG":2114,"Fantasy":2044,"Tower Defense":2024,"Co-op":1898,"Character Customization":1514,"Replay Value":1487,"Action RPG":1397,"Simulation":1024}},

"730":{"appid":730,"name":"Counter-Strike: Global Offensive","developer":"Valve","publisher":"Valve","score_rank":78,"owners":29225079,"owners_variance":154335,"players_forever":28552354,"players_forever_variance":152685,"players_2weeks":9102348,"players_2weeks_variance":88410,"average_forever":17648,"average_2weeks":791,"median_forever":5030,"median_2weeks":358,"ccu":543626,"price":"1499","tags":{"FPS":17082,"Multiplayer":13744,"Shooter":12833,"Action":10881,"Team-Based":10369,"Competitive":9664,"Tactical":8529,"First-Person":7329,"e-sports":6716,"PvP":6383,"Online Co-Op":5714,"Military":4621,"Co-op":4435,"Strategy":4424,"War":4361,"Realistic":3196,"Trading":3191,"Difficult":3158,"Fast-Paced":3100,"Moddable":2496}}

这样的条目数以千计。有没有办法跳过 "top-level" 并在嵌套内查看？
JSON 信息来自 http://steamspy.com/api.php?request=top100in2weeks

Answer 1

这可能是您需要的：

library(jsonlite)
data = fromJSON("http://steamspy.com/api.php?request=top100in2weeks")

appid = lapply(data, function(x){x$appid})
name = lapply(data, function(x){x$name})

df = data.frame(appid = unlist(appid),
                name = unlist(name),
                stringsAsFactors = F)

结果：

> head(df)
        appid                             name
570       570                           Dota 2
730       730 Counter-Strike: Global Offensive
578080 578080    PLAYERUNKNOWN'S BATTLEGROUNDS
440       440                  Team Fortress 2
271590 271590               Grand Theft Auto V
433850 433850           H1Z1: King of the Kill

我会让你添加其余的信息

编辑：将数组添加到数据框

可以在数据框中添加每个游戏的标签信息。时间也被标记了。对于每个游戏，您必须在一列中存储标签名称数组，在另一列中存储标签数量。

在 df 的定义之后添加以下行：

for(k in 1:nrow(d)){
    d$tags[k] = list(names(data[[k]]$tags))
    d$tagsQ[k] = list(unlist(data[[k]]$tags))
}

这会给你：

> d["570",]
    appid   name
570   570 Dota 2

tags
570 Free to Play, MOBA, Strategy, Multiplayer, Team-Based, Action, e-sports, Online Co-Op, Competitive, PvP, RTS, Difficult, RPG, Fantasy, Tower Defense, Co-op, Character Customization, Replay Value, Action RPG, Simulation

tagsQ
570 22686, 7810, 7420, 6759, 4850, 4603, 4092, 3672, 3555, 2657, 2267, 2130, 2116, 2045, 2024, 1898, 1514, 1487, 1397, 1023

在这种情况下，列 tags 和 tagsQ 包含列表。要获取 appid 570 的第二个标签和数量，请执行以下操作：

> df["570","tags"][[1]][2]
[1] "MOBA"

> d["570","tagsQ"][[1]][2]
MOBA 
7810

R: Web scraping JSON, 从嵌套中提取信息

R: Web scraping JSON, extracting information from nest

json

r

web-scraping

jsonlite