抓取亚马逊客户评论
Scraping Amazon Customer Reviews
我正在使用 R 抓取亚马逊客户评论,并遇到了一个错误,我希望有人能对此有所了解。
我注意到 R 未能从所有评论中抓取指定节点(使用 SelectorGadget 找到)。每次我 运行 脚本我检索不同的数量,但从来没有全部。这是非常令人沮丧的,因为目标是抓取评论并将它们编译成 csv 文件,以后可以使用 R 进行操作。本质上,如果一个产品有 200 条评论,当我 运行 脚本时,有时我会得到 150评论,有时是 75 条评论,等等——但不是全部 200 条。这个问题似乎是在我反复抓取之后发生的。
我也遇到了一些超时错误,特别是 "Error in open.connection(x, "rb") :达到超时"。
如何解决这个问题以继续抓取?我是初学者,但非常感谢任何帮助或见解!!
url <- "https://www.amazon.com/Match-Mens-Wild-Cargo-Pants/product-reviews/B009HLOZ9U/ref=cm_cr_arp_d_show_all?ie=UTF8&reviewerType=all_reviews&pageNumber="
N_pages <- 204
A <- NULL
for (j in 1: N_pages){
pant <- read_html(paste0(url, j))
B <- cbind(pant %>% html_nodes(".review-text") %>% html_text() )
A <- rbind(A,B)
}
tail(A)
print(j)
这对你不起作用吗?
N_pages <- 204
A <- NULL
for (j in 1: N_pages){
pant <- read_html(paste0(url, j))
B <- cbind(pant %>% html_nodes(".review-text") %>% html_text() )
A <- rbind(A,B)
}
tail(A)
[,1]
[1938,] "This is really a good item to get. Trendy, probably you can choose a different color, it fits good but I wouldn't say perfect."
[1939,] "I don't write reviews for most products, but I felt the need to do so for these pants for a couple reasons. First, they are great pants! Solid material, well-made, and they fit great. Second, I want to echo those who say you need to go up in size when you order. I wear anywhere from 32-34, depending on the brand. I ordered these in a 36 and they fit like a 33 or 34. I really love the look and feel of these, and will be ordering more!"
[1940,] "I bought the green one before, it is good quality and looks nice, than I purchased the similar one, but the khaki color, but received absolutely different product, different material. really disappointed."
[1941,] "These pants are great! I have been looking to update my wardrobe with a more edgy style; these cargo pants deliver on that. Paired with some casual sneakers or a decent nubuck leather boot completes the look from the waist down. The lazy-casual look is great when traveling, as are the many pockets. I wore these pants on a recent day trip to NYC and traveled comfortably with essential items contained in the 8 pockets. I placed a second order shortly after my first pair arrived because I like them so much. Shipping and delivery is also fairly fast, considering these pants ship from China!"
[1942,] "Pants are awesome, just like the picture. The size runs small, so if you order them I would order them bigger than normal. I usually wear a 34inch waist because i dont like my pants snug, these pants fit more like a 32 inch waist.Other than that i love them!"
[1943,] "the good:Pants are made from the durable cotton that has a nice feel; have a lot of useful features and roomy well placed pockets; durable stitching.the bad:Pants will shrink and drier/hot water is not recommended. Would have been better if the cotton was pretreated to prevent shrinking. I would gladly gave up the belt if I wouldn't have to wary about how to wash these pants.the ugly:faux pocket with a zipper. useless feature. on my pair came with a bright gold zipper, unlike a silver in a picture."
我正在使用 R 抓取亚马逊客户评论,并遇到了一个错误,我希望有人能对此有所了解。
我注意到 R 未能从所有评论中抓取指定节点(使用 SelectorGadget 找到)。每次我 运行 脚本我检索不同的数量,但从来没有全部。这是非常令人沮丧的,因为目标是抓取评论并将它们编译成 csv 文件,以后可以使用 R 进行操作。本质上,如果一个产品有 200 条评论,当我 运行 脚本时,有时我会得到 150评论,有时是 75 条评论,等等——但不是全部 200 条。这个问题似乎是在我反复抓取之后发生的。
我也遇到了一些超时错误,特别是 "Error in open.connection(x, "rb") :达到超时"。
如何解决这个问题以继续抓取?我是初学者,但非常感谢任何帮助或见解!!
url <- "https://www.amazon.com/Match-Mens-Wild-Cargo-Pants/product-reviews/B009HLOZ9U/ref=cm_cr_arp_d_show_all?ie=UTF8&reviewerType=all_reviews&pageNumber="
N_pages <- 204
A <- NULL
for (j in 1: N_pages){
pant <- read_html(paste0(url, j))
B <- cbind(pant %>% html_nodes(".review-text") %>% html_text() )
A <- rbind(A,B)
}
tail(A)
print(j)
这对你不起作用吗?
N_pages <- 204
A <- NULL
for (j in 1: N_pages){
pant <- read_html(paste0(url, j))
B <- cbind(pant %>% html_nodes(".review-text") %>% html_text() )
A <- rbind(A,B)
}
tail(A)
[,1]
[1938,] "This is really a good item to get. Trendy, probably you can choose a different color, it fits good but I wouldn't say perfect."
[1939,] "I don't write reviews for most products, but I felt the need to do so for these pants for a couple reasons. First, they are great pants! Solid material, well-made, and they fit great. Second, I want to echo those who say you need to go up in size when you order. I wear anywhere from 32-34, depending on the brand. I ordered these in a 36 and they fit like a 33 or 34. I really love the look and feel of these, and will be ordering more!"
[1940,] "I bought the green one before, it is good quality and looks nice, than I purchased the similar one, but the khaki color, but received absolutely different product, different material. really disappointed."
[1941,] "These pants are great! I have been looking to update my wardrobe with a more edgy style; these cargo pants deliver on that. Paired with some casual sneakers or a decent nubuck leather boot completes the look from the waist down. The lazy-casual look is great when traveling, as are the many pockets. I wore these pants on a recent day trip to NYC and traveled comfortably with essential items contained in the 8 pockets. I placed a second order shortly after my first pair arrived because I like them so much. Shipping and delivery is also fairly fast, considering these pants ship from China!"
[1942,] "Pants are awesome, just like the picture. The size runs small, so if you order them I would order them bigger than normal. I usually wear a 34inch waist because i dont like my pants snug, these pants fit more like a 32 inch waist.Other than that i love them!"
[1943,] "the good:Pants are made from the durable cotton that has a nice feel; have a lot of useful features and roomy well placed pockets; durable stitching.the bad:Pants will shrink and drier/hot water is not recommended. Would have been better if the cotton was pretreated to prevent shrinking. I would gladly gave up the belt if I wouldn't have to wary about how to wash these pants.the ugly:faux pocket with a zipper. useless feature. on my pair came with a bright gold zipper, unlike a silver in a picture."