Web 抓取数据:哪个 Pokemon 可以学习哪些攻击?

Webscraping Data : Which Pokemon Can Learn Which Attacks?

我正在尝试创建一个 table(150 行,165 列),其中:

我能够在 R:

中手动创建这个 table

这是所有的名字:

 names
  [1] "Bulbasaur"  "Ivysaur"    "Venusaur"   "Charmander" "Charmeleon" "Charizard"  "Squirtle"   "Wartortle"  "Blastoise"  "Caterpie"   "Metapod"    "Butterfree" "Weedle"     "Kakuna"     "Beedrill"   "Pidgey"     "Pidgeotto" 
 [18] "Pidgeot"    "Rattata"    "Raticate"   "Spearow"    "Fearow"     "Ekans"      "Arbok"      "Pikachu"    "Raichu"     "Sandshrew"  "Sandslash"  "Nidoran"    "Nidorina"   "Nidoqueen"  "Nidorino"   "Nidoking"   "Clefairy"  
 [35] "Clefable"   "Vulpix"     "Ninetales"  "Jigglypuff" "Wigglytuff" "Zubat"      "Golbat"     "Oddish"     "Gloom"      "Vileplume"  "Paras"      "Parasect"   "Venonat"    "Venomoth"   "Diglett"    "Dugtrio"    "Meowth"    
 [52] "Persian"    "Psyduck"    "Golduck"    "Mankey"     "Primeape"   "Growlithe"  "Arcanine"   "Poliwag"    "Poliwhirl"  "Poliwrath"  "Abra"       "Kadabra"    "Alakazam"   "Machop"     "Machoke"    "Machamp"    "Bellsprout"
 [69] "Weepinbell" "Victreebel" "Tentacool"  "Tentacruel" "Geodude"    "Graveler"   "Golem"      "Ponyta"     "Rapidash"   "Slowpoke"   "Slowbro"    "Magnemite"  "Magneton"   "Farfetch’d" "Doduo"      "Dodrio"     "Seel"      
 [86] "Dewgong"    "Grimer"     "Muk"        "Shellder"   "Cloyster"   "Gastly"     "Haunter"    "Gengar"     "Onix"       "Drowzee"    "Hypno"      "Krabby"     "Kingler"    "Voltorb"    "Electrode"  "Exeggcute"  "Exeggutor" 
[103] "Cubone"     "Marowak"    "Hitmonlee"  "Hitmonchan" "Lickitung"  "Koffing"    "Weezing"    "Rhyhorn"    "Rhydon"     "Chansey"    "Tangela"    "Kangaskhan" "Horsea"     "Seadra"     "Goldeen"    "Seaking"    "Staryu"    
[120] "Starmie"    "Mr.Mime"    "Scyther"    "Jynx"       "Electabuzz" "Magmar"     "Pinsir"     "Tauros"     "Magikarp"   "Gyarados"   "Lapras"     "Ditto"      "Eevee"      "Vaporeon"   "Jolteon"    "Flareon"    "Porygon"   
[137] "Omanyte"    "Omastar"    "Kabuto"     "Kabutops"   "Aerodactyl" "Snorlax"    "Articuno"   "Zapdos"     "Moltres"    "Dratini"    "Dragonair"  "Dragonite"  "Mewtwo"     "Mew"    

以下是所有攻击:

 [1] "Absorb"          "Acid "           "Acid Armor "     "Agility "        "Amnesia "        "Aurora Beam "    "Barrage "        "Barrier "        "Bide "           "Bind "           "Bite "           "Blizzard "      
 [13] "Body Slam "      "Bone Club "      "Bonemerang "     "Bubble "         "Bubble Beam "    "Clamp "          "Comet Punch "    "Confuse Ray "    "Confusion "      "Constrict "      "Conversion "     "Counter "       
 [25] "Crabhammer "     "Cut "            "Defense Curl "   "Dig "            "Disable "        "Dizzy Punch "    "Double Kick "    "Double Slap "    "Double Team "    "Double-Edge "    "Dragon Rage "    "Dream Eater "   
 [37] "Drill Peck "     "Earthquake "     "Egg Bomb "       "Ember "          "Explosion "      "Fire Blast "     "Fire Punch "     "Fire Spin "      "Fissure "        "Flamethrower "   "Flash "          "Fly "           
 [49] "Focus Energy "   "Fury Attack "    "Fury Swipes "    "Glare "          "Growl "          "Growth "         "Guillotine "     "Gust "           "Harden "         "Haze "           "Headbutt "       "High Jump Kick "
 [61] "Horn Attack "    "Horn Drill "     "Hydro Pump "     "Hyper Beam "     "Hyper Fang "     "Hypnosis "       "Ice Beam "       "Ice Punch "      "Jump Kick "      "Karate Chop "    "Kinesis "        "Leech Life "    
 [73] "Leech Seed "     "Leer "           "Lick "           "Light Screen "   "Lovely Kiss "    "Low Kick "       "Meditate "       "Mega Drain "     "Mega Kick "      "Mega Punch "     "Metronome "      "Mimic "         
 [85] "Minimize "       "Mirror Move "    "Mist "           "Night Shade "    "Pay Day "        "Peck "           "Petal Dance "    "Pin Missile "    "Poison Gas "     "Poison Powder "  "Poison Sting "   "Pound "         
 [97] "Psybeam "        "Psychic "        "Psywave "        "Quick Attack "   "Rage "           "Razor Leaf "     "Razor Wind "     "Recover "        "Reflect "        "Rest "           "Roar "           "Rock Slide "    
[109] "Rock Throw "     "Rolling Kick "   "Sand Attack "    "Scratch "        "Screech "        "Seismic Toss "   "Self-Destruct "  "Sharpen "        "Sing "           "Skull Bash "     "Sky Attack "     "Slam "          
[121] "Slash "          "Sleep Powder "   "Sludge "         "Smog "           "Smokescreen "    "Soft-Boiled "    "Solar Beam "     "Sonic Boom "     "Spike Cannon "   "Splash "         "Spore "          "Stomp "         
[133] "Strength "       "String Shot "    "Struggle "       "Stun Spore "     "Submission "     "Substitute "     "Super Fang "     "Supersonic "     "Surf "           "Swift "          "Swords Dance "   "Tackle "        
[145] "Tail Whip "      "Take Down "      "Teleport "       "Thrash "         "Thunder "        "Thunder Punch "  "Thunder Shock "  "Thunder Wave "   "Thunderbolt "    "Toxic "          "Transform "      "Tri Attack "    
[157] "Twineedle "      "Vine Whip "      "Vise Grip "      "Water Gun "      "Waterfall "      "Whirlwind "      "Wing Attack "    "Withdraw "       "Wrap "         

然后我把它们拼成一个table:

m <- data.frame(matrix(0, ncol = 165, nrow = 150))
rownames(m) <- names
colnames(m) <- moves

根据之前的问题 (),我能够弄清楚如何识别所有 150 个网站,这些网站包含有关哪些口袋妖怪可以学习哪些攻击的信息:

template_1 = rep("https://pokemondb.net/pokedex/",150)
template_2 = rep("/moves/1",150)

pokemon_websites = data.frame(template_1, names, template_2)

pokemon_websites$full_website =  paste(pokemon_websites$template_1, pokemon_websites$names, pokemon_websites$template_2)

library(stringr)
 pokemon_websites$full_website = str_remove_all( pokemon_websites$full_website," ")

例如,这里是前 6 个口袋妖怪的网站:

head(pokemon_websites$full_website)

[1] "https://pokemondb.net/pokedex/Bulbasaur/moves/1"  "https://pokemondb.net/pokedex/Ivysaur/moves/1"    "https://pokemondb.net/pokedex/Venusaur/moves/1"   "https://pokemondb.net/pokedex/Charmander/moves/1"
[5] "https://pokemondb.net/pokedex/Charmeleon/moves/1" "https://pokemondb.net/pokedex/Charizard/moves/1"

例如,第一只神奇宝贝“妙蛙种子”可以学习以下招式(https://pokemondb.net/pokedex/Bulbasaur/moves/1):

这意味着第一行的以下“m”列应替换为“1”:

growl = 1
tackle = 1
`Leech Seed `  = 1
`Vine Whip ` = 1
`Poison Power ` = 1
`Razor Leaf `  = 1
`Growth `  = 1
`Sleep Power `  = 1
`Solar Beam `= 1
Cut = 1
`Swords Dance`= 1
Toxic = 1
`Body Slam ` = 1
`Take Down ` = 1
`Double-Edge ` = 1
Rage = 1
`Mega Drain ` = 1
`Solar Beam ` = 1
Mimic  = 1
`Double Team ` = 1
Reflect = 1
Bide = 1
Rest = 1
Substitute = 1

是否可以:

谢谢!

这是一个解决方案,将 url 的列表获取到感兴趣的网页,收集每个 table 的移动并创建一个包含“1”的数据框。
然后将个体table组合成最终答案

library(rvest)
library(dplyr)

urls <- c("https://pokemondb.net/pokedex/Bulbasaur/moves/1", "https://pokemondb.net/pokedex/Ivysaur/moves/1")

movedfs <- lapply(urls, function(url){
   
   #read page
   page <- read_html(url)

   #get the tables
   tables <- page %>% html_elements("table") %>% html_table()

   #process the 3 tables 
   moves<-lapply(tables[1:3], function(table){
      table$Move
   })

   foundmoves <- unique(trimws(unlist(moves)))
   #make dataframe with the list of moves
   tempdf <- data.frame(moves=t(foundmoves))
   #make column names and value row
   names(tempdf)<-foundmoves
   tempdf[1,] <- 1
   tempdf #return valye
})

#make final table
finaltable <- bind_rows(movedfs)

#replace the NA with 0
finaltable <- apply(finaltable, 2, function(x){
   ifelse(is.na(x), 0, 1)
})