将输出抓取到多个 csv 文件中

Crawling output into multiple csv files

我想知道如何将我的爬行结果导出到我已爬行的每个不同城市的多个 csv 文件中。不知何故,我 运行 陷入困境,没有正确的方法来解决它。

这是我的代码:

import requests
from bs4 import BeautifulSoup
import csv

user_agent = {'User-agent': 'Chrome/43.0.2357.124'}
output_file= open("TA.csv", "w", newline='')
RegionIDArray = [187147,187323,186338]
dict = {187147: 'Paris', 187323: 'Berlin', 186338: 'London'}
already_printed = set()

for reg in RegionIDArray:
    for page in range(1,700,30):
        r = requests.get("https://www.tripadvisor.de/Attractions-c47-g" + str(reg) + "-oa" + str(page) + ".html")
        soup = BeautifulSoup(r.content)

        g_data = soup.find_all("div", {"class": "element_wrap"})

        for item in g_data:
            header = item.find_all("div", {"class": "property_title"})
            item = (header[0].text.strip())
            if item not in already_printed:
                already_printed.add(item)

                print("POI: " + str(item) + " | " + "Location: " + str(dict[reg]))

                writer = csv.writer(output_file)
                csv_fields = ['POI', 'Locaton']
                if g_data:
                    writer.writerow([str(item), str(dict[reg])])

我的目标是为巴黎、柏林和伦敦获取三个单独的 CSV 文件,而不是将所有结果都放在一个大的 csv 文件中。

你们能帮帮我吗?感谢您的反馈:)

我对您的代码做了一些小的修改。为了为每个语言环境制作文件,我将 out_file 名称移动到循环中。

请注意,我现在没有时间,最后一行是忽略 unicode 错误的 hack -- 它只是跳过尝试输出具有非 ascii 字符的行。这样不好也许有人可以修复那部分?

import requests
from bs4 import BeautifulSoup
import csv

user_agent = {'User-agent': 'Chrome/43.0.2357.124'}
RegionIDArray = {187147: 'Paris', 187323: 'Berlin', 186338: 'London'}
already_printed = set()

for reg in RegionIDArray:
    output_file= open("TA" + str(reg) + ".csv", "w")
    for page in range(1,700,30):
        r = requests.get("https://www.tripadvisor.de/Attractions-c47-g" + str(reg) + "-oa" + str(page) + ".html")
        soup = BeautifulSoup(r.content)

        g_data = soup.find_all("div", {"class": "element_wrap"})

        for item in g_data:
            header = item.find_all("div", {"class": "property_title"})
            item = (header[0].text.strip())
            if item not in already_printed:
                already_printed.add(item)

                # print("POI: " + str(item) + " | " + "Location: " + str(RegionIDArray[reg]))

                writer = csv.writer(output_file)
                csv_fields = ['POI', 'Locaton']
                if g_data:
                    try:
                        writer.writerow([str(item), str(RegionIDArray[reg])])
                    except:
                        pass