如何从 yelp_academic_dataset_business.json 创建另一个仅包含酒店、餐厅或两者的业务的 csv 文件?

How do I create another csv file from yelp_academic_dataset_business.json that include only business of Hotels, Restaurant, or both?

1如何从 yelp_academic_dataset_business.json 创建另一个仅包含酒店、餐厅或这两个类别的业务的 csv 文件?

原始 yelp 业务数据集包含仅包含牙医、美发沙龙等的行。我只想 select 酒店、饭店和两者的业务。

我刚开始学习 python,正在学习机器学习实验教程。下面的代码给我一个错误。我已经用谷歌搜索并阅读了很多,但仍然不明白。 任何帮助将不胜感激。

Image of what the yelp_academic_dataset_business.csv look like

image of code and error message

data2 = []
with open('yelp_academic_dataset_business.json') as f:
    for line in f:
        data2.append(json.loads(line))
len(data2)

business_id = []
city = []
state = []
stars = []
review_count = []
categories = []
postal_code = []
latitude = []
longitude = []
pricerange = []
is_open = []
name = []


for entry in range(0, len(data2)): 
    if "Restaurants" in data2[entry]["categories"]:
        business_id.append(data2[entry]['business_id'])
        name.append(data2[entry]['name'])
        city.append(data2[entry]['city'])
        state.append(data2[entry]['state'])
        stars.append(data2[entry]['stars'])
        postal_code.append(data2[entry]['postal_code'])
        review_count.append(data2[entry]['review_count'])
        categories.append(data2[entry]['categories'])
        latitude.append(data2[entry]['latitude'])
        longitude.append(data2[entry]['longitude'])
        is_open.append(data2[entry]['is_open'])
        if 'RestaurantsPriceRange2'in data2[entry]['attributes']:
            pricerange.append(data2[entry]['attributes']['RestaurantsPriceRange2'])
        else:
            pricerange.append(0)

data2 = {'business_id ':business_id,'name':name,'city':city,'state':state,'stars':stars,'review_count':review_count,
    'categories':categories,'latitude':latitude,'longitude':longitude,'is_open':is_open,'pricerange':pricerange,'postal_code':postal_code}

business_data = pd.DataFrame(data2)

此错误意味着您正在迭代 None 对象。

基本上data2[entry]['attributes']不是列表,而是None

所以当你

if 'RestaurantsPriceRange2'in data2[entry]['attributes']:

您无法检查元素是否在列表中,因为它不是列表。

所以你必须先检查它是否是一个列表。 正确的行是

if data2[entry]['attributes'] and 
   'RestaurantsPriceRange2'in data2[entry]['attributes']:

终于找到问题了。您的代码中断了索引为 21 的业务。

if 'RestaurantsPriceRange2' in data2[entry]['attributes']:

该语句检查 "restaurantsPriceRange2" 是否在数组 data2[entry]['attributes'] 中,但在业务 21 中 ['attributes'] 的值是 None,我觉得是因为商家没有属性。

print data2[21]

{u'city': u'Cleveland', u'neighborhood': u'Central', u'name': u"Rally's Hamburgers", u'business_id': u'gJ5xSt6147gkcZ9Es0WxlA', u'longitude': -81.6663746, u'hours': None, u'state': u'OH', u'postal_code': u'44115', u'categories': u'Fast Food, Burgers, Restaurants', u'stars': 3.0, u'address': u'3040 Carnegie Ave', u'latitude': 41.4999894, u'review_count': 5, u'attributes': None, u'is_open': 1}

因此您可以使用 If 处理错误,检查 data2[entry]['attributes'] 是否是 none.

if data2[entry]['attributes'] != None:

测试我还发现如果 "Restaurants" in data2[entry]["categories"]: 在某些业务中给出相同的错误所以,整个代码将如下所示:

import json;

data2 = []
with open('yelp_academic_dataset_business.json') as f:
    for line in f:
        data2.append(json.loads(line))
len(data2)

business_id = []
city = []
state = []
stars = []
review_count = []
categories = []
postal_code = []
latitude = []
longitude = []
pricerange = []
is_open = []
name = []

for entry in range(0, len(data2)):
    if data2[entry]["categories"] != None:
        if "Restaurants" in data2[entry]["categories"]:
            business_id.append(data2[entry]['business_id'])
            name.append(data2[entry]['name'])
            city.append(data2[entry]['city'])
            state.append(data2[entry]['state'])
            stars.append(data2[entry]['stars'])
            postal_code.append(data2[entry]['postal_code'])
            review_count.append(data2[entry]['review_count'])
            categories.append(data2[entry]['categories'])
            latitude.append(data2[entry]['latitude'])
            longitude.append(data2[entry]['longitude'])
            is_open.append(data2[entry]['is_open'])
            if data2[entry]['attributes'] != None:
                if 'RestaurantsPriceRange2' in data2[entry]['attributes']:
                     pricerange.append(data2[entry]['attributes']['RestaurantsPriceRange2'])
                else:
                    pricerange.append(0)

data2 = {'business_id ':business_id,'name':name,'city':city,'state':state,'stars':stars,'review_count':review_count,'categories':categories,'latitude':latitude,'longitude':longitude,'is_open':is_open,'pricerange':pricerange,'postal_code':postal_code}

请记住,当您阅读 json 时,您需要注意空值或空数组,因此您总是想检查该值是否存在,以免程序崩溃。