Web 抓取数据:哪个 Pokemon 可以学习哪些攻击?
Webscraping Data : Which Pokemon Can Learn Which Attacks?
我正在尝试创建一个 table(150 行,165 列),其中:
- 每一行都是一只宝可梦的名字(原版宝可梦,150只)
- 每一列都是这些神奇宝贝可以学习的“攻击”的名称(第一代)
- 每个元素要么是“1”要么是“0”,表示该口袋妖怪是否可以学习“攻击”(例如 1 = 是,0 = 否)
我能够在 R:
中手动创建这个 table
这是所有的名字:
names
[1] "Bulbasaur" "Ivysaur" "Venusaur" "Charmander" "Charmeleon" "Charizard" "Squirtle" "Wartortle" "Blastoise" "Caterpie" "Metapod" "Butterfree" "Weedle" "Kakuna" "Beedrill" "Pidgey" "Pidgeotto"
[18] "Pidgeot" "Rattata" "Raticate" "Spearow" "Fearow" "Ekans" "Arbok" "Pikachu" "Raichu" "Sandshrew" "Sandslash" "Nidoran" "Nidorina" "Nidoqueen" "Nidorino" "Nidoking" "Clefairy"
[35] "Clefable" "Vulpix" "Ninetales" "Jigglypuff" "Wigglytuff" "Zubat" "Golbat" "Oddish" "Gloom" "Vileplume" "Paras" "Parasect" "Venonat" "Venomoth" "Diglett" "Dugtrio" "Meowth"
[52] "Persian" "Psyduck" "Golduck" "Mankey" "Primeape" "Growlithe" "Arcanine" "Poliwag" "Poliwhirl" "Poliwrath" "Abra" "Kadabra" "Alakazam" "Machop" "Machoke" "Machamp" "Bellsprout"
[69] "Weepinbell" "Victreebel" "Tentacool" "Tentacruel" "Geodude" "Graveler" "Golem" "Ponyta" "Rapidash" "Slowpoke" "Slowbro" "Magnemite" "Magneton" "Farfetch’d" "Doduo" "Dodrio" "Seel"
[86] "Dewgong" "Grimer" "Muk" "Shellder" "Cloyster" "Gastly" "Haunter" "Gengar" "Onix" "Drowzee" "Hypno" "Krabby" "Kingler" "Voltorb" "Electrode" "Exeggcute" "Exeggutor"
[103] "Cubone" "Marowak" "Hitmonlee" "Hitmonchan" "Lickitung" "Koffing" "Weezing" "Rhyhorn" "Rhydon" "Chansey" "Tangela" "Kangaskhan" "Horsea" "Seadra" "Goldeen" "Seaking" "Staryu"
[120] "Starmie" "Mr.Mime" "Scyther" "Jynx" "Electabuzz" "Magmar" "Pinsir" "Tauros" "Magikarp" "Gyarados" "Lapras" "Ditto" "Eevee" "Vaporeon" "Jolteon" "Flareon" "Porygon"
[137] "Omanyte" "Omastar" "Kabuto" "Kabutops" "Aerodactyl" "Snorlax" "Articuno" "Zapdos" "Moltres" "Dratini" "Dragonair" "Dragonite" "Mewtwo" "Mew"
以下是所有攻击:
[1] "Absorb" "Acid " "Acid Armor " "Agility " "Amnesia " "Aurora Beam " "Barrage " "Barrier " "Bide " "Bind " "Bite " "Blizzard "
[13] "Body Slam " "Bone Club " "Bonemerang " "Bubble " "Bubble Beam " "Clamp " "Comet Punch " "Confuse Ray " "Confusion " "Constrict " "Conversion " "Counter "
[25] "Crabhammer " "Cut " "Defense Curl " "Dig " "Disable " "Dizzy Punch " "Double Kick " "Double Slap " "Double Team " "Double-Edge " "Dragon Rage " "Dream Eater "
[37] "Drill Peck " "Earthquake " "Egg Bomb " "Ember " "Explosion " "Fire Blast " "Fire Punch " "Fire Spin " "Fissure " "Flamethrower " "Flash " "Fly "
[49] "Focus Energy " "Fury Attack " "Fury Swipes " "Glare " "Growl " "Growth " "Guillotine " "Gust " "Harden " "Haze " "Headbutt " "High Jump Kick "
[61] "Horn Attack " "Horn Drill " "Hydro Pump " "Hyper Beam " "Hyper Fang " "Hypnosis " "Ice Beam " "Ice Punch " "Jump Kick " "Karate Chop " "Kinesis " "Leech Life "
[73] "Leech Seed " "Leer " "Lick " "Light Screen " "Lovely Kiss " "Low Kick " "Meditate " "Mega Drain " "Mega Kick " "Mega Punch " "Metronome " "Mimic "
[85] "Minimize " "Mirror Move " "Mist " "Night Shade " "Pay Day " "Peck " "Petal Dance " "Pin Missile " "Poison Gas " "Poison Powder " "Poison Sting " "Pound "
[97] "Psybeam " "Psychic " "Psywave " "Quick Attack " "Rage " "Razor Leaf " "Razor Wind " "Recover " "Reflect " "Rest " "Roar " "Rock Slide "
[109] "Rock Throw " "Rolling Kick " "Sand Attack " "Scratch " "Screech " "Seismic Toss " "Self-Destruct " "Sharpen " "Sing " "Skull Bash " "Sky Attack " "Slam "
[121] "Slash " "Sleep Powder " "Sludge " "Smog " "Smokescreen " "Soft-Boiled " "Solar Beam " "Sonic Boom " "Spike Cannon " "Splash " "Spore " "Stomp "
[133] "Strength " "String Shot " "Struggle " "Stun Spore " "Submission " "Substitute " "Super Fang " "Supersonic " "Surf " "Swift " "Swords Dance " "Tackle "
[145] "Tail Whip " "Take Down " "Teleport " "Thrash " "Thunder " "Thunder Punch " "Thunder Shock " "Thunder Wave " "Thunderbolt " "Toxic " "Transform " "Tri Attack "
[157] "Twineedle " "Vine Whip " "Vise Grip " "Water Gun " "Waterfall " "Whirlwind " "Wing Attack " "Withdraw " "Wrap "
然后我把它们拼成一个table:
m <- data.frame(matrix(0, ncol = 165, nrow = 150))
rownames(m) <- names
colnames(m) <- moves
根据之前的问题 (),我能够弄清楚如何识别所有 150 个网站,这些网站包含有关哪些口袋妖怪可以学习哪些攻击的信息:
template_1 = rep("https://pokemondb.net/pokedex/",150)
template_2 = rep("/moves/1",150)
pokemon_websites = data.frame(template_1, names, template_2)
pokemon_websites$full_website = paste(pokemon_websites$template_1, pokemon_websites$names, pokemon_websites$template_2)
library(stringr)
pokemon_websites$full_website = str_remove_all( pokemon_websites$full_website," ")
例如,这里是前 6 个口袋妖怪的网站:
head(pokemon_websites$full_website)
[1] "https://pokemondb.net/pokedex/Bulbasaur/moves/1" "https://pokemondb.net/pokedex/Ivysaur/moves/1" "https://pokemondb.net/pokedex/Venusaur/moves/1" "https://pokemondb.net/pokedex/Charmander/moves/1"
[5] "https://pokemondb.net/pokedex/Charmeleon/moves/1" "https://pokemondb.net/pokedex/Charizard/moves/1"
例如,第一只神奇宝贝“妙蛙种子”可以学习以下招式(https://pokemondb.net/pokedex/Bulbasaur/moves/1):
这意味着第一行的以下“m”列应替换为“1”:
growl = 1
tackle = 1
`Leech Seed ` = 1
`Vine Whip ` = 1
`Poison Power ` = 1
`Razor Leaf ` = 1
`Growth ` = 1
`Sleep Power ` = 1
`Solar Beam `= 1
Cut = 1
`Swords Dance`= 1
Toxic = 1
`Body Slam ` = 1
`Take Down ` = 1
`Double-Edge ` = 1
Rage = 1
`Mega Drain ` = 1
`Solar Beam ` = 1
Mimic = 1
`Double Team ` = 1
Reflect = 1
Bide = 1
Rest = 1
Substitute = 1
是否可以:
- Webscrape 150 个网站的列表,找出哪些口袋妖怪可以学习哪些攻击?
- 当宝可梦可以学习该攻击时,将相应的元素替换为1?
谢谢!
这是一个解决方案,将 url 的列表获取到感兴趣的网页,收集每个 table 的移动并创建一个包含“1”的数据框。
然后将个体table组合成最终答案
library(rvest)
library(dplyr)
urls <- c("https://pokemondb.net/pokedex/Bulbasaur/moves/1", "https://pokemondb.net/pokedex/Ivysaur/moves/1")
movedfs <- lapply(urls, function(url){
#read page
page <- read_html(url)
#get the tables
tables <- page %>% html_elements("table") %>% html_table()
#process the 3 tables
moves<-lapply(tables[1:3], function(table){
table$Move
})
foundmoves <- unique(trimws(unlist(moves)))
#make dataframe with the list of moves
tempdf <- data.frame(moves=t(foundmoves))
#make column names and value row
names(tempdf)<-foundmoves
tempdf[1,] <- 1
tempdf #return valye
})
#make final table
finaltable <- bind_rows(movedfs)
#replace the NA with 0
finaltable <- apply(finaltable, 2, function(x){
ifelse(is.na(x), 0, 1)
})
我正在尝试创建一个 table(150 行,165 列),其中:
- 每一行都是一只宝可梦的名字(原版宝可梦,150只)
- 每一列都是这些神奇宝贝可以学习的“攻击”的名称(第一代)
- 每个元素要么是“1”要么是“0”,表示该口袋妖怪是否可以学习“攻击”(例如 1 = 是,0 = 否)
我能够在 R:
中手动创建这个 table这是所有的名字:
names
[1] "Bulbasaur" "Ivysaur" "Venusaur" "Charmander" "Charmeleon" "Charizard" "Squirtle" "Wartortle" "Blastoise" "Caterpie" "Metapod" "Butterfree" "Weedle" "Kakuna" "Beedrill" "Pidgey" "Pidgeotto"
[18] "Pidgeot" "Rattata" "Raticate" "Spearow" "Fearow" "Ekans" "Arbok" "Pikachu" "Raichu" "Sandshrew" "Sandslash" "Nidoran" "Nidorina" "Nidoqueen" "Nidorino" "Nidoking" "Clefairy"
[35] "Clefable" "Vulpix" "Ninetales" "Jigglypuff" "Wigglytuff" "Zubat" "Golbat" "Oddish" "Gloom" "Vileplume" "Paras" "Parasect" "Venonat" "Venomoth" "Diglett" "Dugtrio" "Meowth"
[52] "Persian" "Psyduck" "Golduck" "Mankey" "Primeape" "Growlithe" "Arcanine" "Poliwag" "Poliwhirl" "Poliwrath" "Abra" "Kadabra" "Alakazam" "Machop" "Machoke" "Machamp" "Bellsprout"
[69] "Weepinbell" "Victreebel" "Tentacool" "Tentacruel" "Geodude" "Graveler" "Golem" "Ponyta" "Rapidash" "Slowpoke" "Slowbro" "Magnemite" "Magneton" "Farfetch’d" "Doduo" "Dodrio" "Seel"
[86] "Dewgong" "Grimer" "Muk" "Shellder" "Cloyster" "Gastly" "Haunter" "Gengar" "Onix" "Drowzee" "Hypno" "Krabby" "Kingler" "Voltorb" "Electrode" "Exeggcute" "Exeggutor"
[103] "Cubone" "Marowak" "Hitmonlee" "Hitmonchan" "Lickitung" "Koffing" "Weezing" "Rhyhorn" "Rhydon" "Chansey" "Tangela" "Kangaskhan" "Horsea" "Seadra" "Goldeen" "Seaking" "Staryu"
[120] "Starmie" "Mr.Mime" "Scyther" "Jynx" "Electabuzz" "Magmar" "Pinsir" "Tauros" "Magikarp" "Gyarados" "Lapras" "Ditto" "Eevee" "Vaporeon" "Jolteon" "Flareon" "Porygon"
[137] "Omanyte" "Omastar" "Kabuto" "Kabutops" "Aerodactyl" "Snorlax" "Articuno" "Zapdos" "Moltres" "Dratini" "Dragonair" "Dragonite" "Mewtwo" "Mew"
以下是所有攻击:
[1] "Absorb" "Acid " "Acid Armor " "Agility " "Amnesia " "Aurora Beam " "Barrage " "Barrier " "Bide " "Bind " "Bite " "Blizzard "
[13] "Body Slam " "Bone Club " "Bonemerang " "Bubble " "Bubble Beam " "Clamp " "Comet Punch " "Confuse Ray " "Confusion " "Constrict " "Conversion " "Counter "
[25] "Crabhammer " "Cut " "Defense Curl " "Dig " "Disable " "Dizzy Punch " "Double Kick " "Double Slap " "Double Team " "Double-Edge " "Dragon Rage " "Dream Eater "
[37] "Drill Peck " "Earthquake " "Egg Bomb " "Ember " "Explosion " "Fire Blast " "Fire Punch " "Fire Spin " "Fissure " "Flamethrower " "Flash " "Fly "
[49] "Focus Energy " "Fury Attack " "Fury Swipes " "Glare " "Growl " "Growth " "Guillotine " "Gust " "Harden " "Haze " "Headbutt " "High Jump Kick "
[61] "Horn Attack " "Horn Drill " "Hydro Pump " "Hyper Beam " "Hyper Fang " "Hypnosis " "Ice Beam " "Ice Punch " "Jump Kick " "Karate Chop " "Kinesis " "Leech Life "
[73] "Leech Seed " "Leer " "Lick " "Light Screen " "Lovely Kiss " "Low Kick " "Meditate " "Mega Drain " "Mega Kick " "Mega Punch " "Metronome " "Mimic "
[85] "Minimize " "Mirror Move " "Mist " "Night Shade " "Pay Day " "Peck " "Petal Dance " "Pin Missile " "Poison Gas " "Poison Powder " "Poison Sting " "Pound "
[97] "Psybeam " "Psychic " "Psywave " "Quick Attack " "Rage " "Razor Leaf " "Razor Wind " "Recover " "Reflect " "Rest " "Roar " "Rock Slide "
[109] "Rock Throw " "Rolling Kick " "Sand Attack " "Scratch " "Screech " "Seismic Toss " "Self-Destruct " "Sharpen " "Sing " "Skull Bash " "Sky Attack " "Slam "
[121] "Slash " "Sleep Powder " "Sludge " "Smog " "Smokescreen " "Soft-Boiled " "Solar Beam " "Sonic Boom " "Spike Cannon " "Splash " "Spore " "Stomp "
[133] "Strength " "String Shot " "Struggle " "Stun Spore " "Submission " "Substitute " "Super Fang " "Supersonic " "Surf " "Swift " "Swords Dance " "Tackle "
[145] "Tail Whip " "Take Down " "Teleport " "Thrash " "Thunder " "Thunder Punch " "Thunder Shock " "Thunder Wave " "Thunderbolt " "Toxic " "Transform " "Tri Attack "
[157] "Twineedle " "Vine Whip " "Vise Grip " "Water Gun " "Waterfall " "Whirlwind " "Wing Attack " "Withdraw " "Wrap "
然后我把它们拼成一个table:
m <- data.frame(matrix(0, ncol = 165, nrow = 150))
rownames(m) <- names
colnames(m) <- moves
根据之前的问题 (
template_1 = rep("https://pokemondb.net/pokedex/",150)
template_2 = rep("/moves/1",150)
pokemon_websites = data.frame(template_1, names, template_2)
pokemon_websites$full_website = paste(pokemon_websites$template_1, pokemon_websites$names, pokemon_websites$template_2)
library(stringr)
pokemon_websites$full_website = str_remove_all( pokemon_websites$full_website," ")
例如,这里是前 6 个口袋妖怪的网站:
head(pokemon_websites$full_website)
[1] "https://pokemondb.net/pokedex/Bulbasaur/moves/1" "https://pokemondb.net/pokedex/Ivysaur/moves/1" "https://pokemondb.net/pokedex/Venusaur/moves/1" "https://pokemondb.net/pokedex/Charmander/moves/1"
[5] "https://pokemondb.net/pokedex/Charmeleon/moves/1" "https://pokemondb.net/pokedex/Charizard/moves/1"
例如,第一只神奇宝贝“妙蛙种子”可以学习以下招式(https://pokemondb.net/pokedex/Bulbasaur/moves/1):
这意味着第一行的以下“m”列应替换为“1”:
growl = 1
tackle = 1
`Leech Seed ` = 1
`Vine Whip ` = 1
`Poison Power ` = 1
`Razor Leaf ` = 1
`Growth ` = 1
`Sleep Power ` = 1
`Solar Beam `= 1
Cut = 1
`Swords Dance`= 1
Toxic = 1
`Body Slam ` = 1
`Take Down ` = 1
`Double-Edge ` = 1
Rage = 1
`Mega Drain ` = 1
`Solar Beam ` = 1
Mimic = 1
`Double Team ` = 1
Reflect = 1
Bide = 1
Rest = 1
Substitute = 1
是否可以:
- Webscrape 150 个网站的列表,找出哪些口袋妖怪可以学习哪些攻击?
- 当宝可梦可以学习该攻击时,将相应的元素替换为1?
谢谢!
这是一个解决方案,将 url 的列表获取到感兴趣的网页,收集每个 table 的移动并创建一个包含“1”的数据框。
然后将个体table组合成最终答案
library(rvest)
library(dplyr)
urls <- c("https://pokemondb.net/pokedex/Bulbasaur/moves/1", "https://pokemondb.net/pokedex/Ivysaur/moves/1")
movedfs <- lapply(urls, function(url){
#read page
page <- read_html(url)
#get the tables
tables <- page %>% html_elements("table") %>% html_table()
#process the 3 tables
moves<-lapply(tables[1:3], function(table){
table$Move
})
foundmoves <- unique(trimws(unlist(moves)))
#make dataframe with the list of moves
tempdf <- data.frame(moves=t(foundmoves))
#make column names and value row
names(tempdf)<-foundmoves
tempdf[1,] <- 1
tempdf #return valye
})
#make final table
finaltable <- bind_rows(movedfs)
#replace the NA with 0
finaltable <- apply(finaltable, 2, function(x){
ifelse(is.na(x), 0, 1)
})