如何从 yelp_academic_dataset_business.json 创建另一个仅包含酒店、餐厅或两者的业务的 csv 文件?
How do I create another csv file from yelp_academic_dataset_business.json that include only business of Hotels, Restaurant, or both?
1如何从 yelp_academic_dataset_business.json 创建另一个仅包含酒店、餐厅或这两个类别的业务的 csv 文件?
原始 yelp 业务数据集包含仅包含牙医、美发沙龙等的行。我只想 select 酒店、饭店和两者的业务。
我刚开始学习 python,正在学习机器学习实验教程。下面的代码给我一个错误。我已经用谷歌搜索并阅读了很多,但仍然不明白。
任何帮助将不胜感激。
Image of what the yelp_academic_dataset_business.csv look like
image of code and error message
data2 = []
with open('yelp_academic_dataset_business.json') as f:
for line in f:
data2.append(json.loads(line))
len(data2)
business_id = []
city = []
state = []
stars = []
review_count = []
categories = []
postal_code = []
latitude = []
longitude = []
pricerange = []
is_open = []
name = []
for entry in range(0, len(data2)):
if "Restaurants" in data2[entry]["categories"]:
business_id.append(data2[entry]['business_id'])
name.append(data2[entry]['name'])
city.append(data2[entry]['city'])
state.append(data2[entry]['state'])
stars.append(data2[entry]['stars'])
postal_code.append(data2[entry]['postal_code'])
review_count.append(data2[entry]['review_count'])
categories.append(data2[entry]['categories'])
latitude.append(data2[entry]['latitude'])
longitude.append(data2[entry]['longitude'])
is_open.append(data2[entry]['is_open'])
if 'RestaurantsPriceRange2'in data2[entry]['attributes']:
pricerange.append(data2[entry]['attributes']['RestaurantsPriceRange2'])
else:
pricerange.append(0)
data2 = {'business_id ':business_id,'name':name,'city':city,'state':state,'stars':stars,'review_count':review_count,
'categories':categories,'latitude':latitude,'longitude':longitude,'is_open':is_open,'pricerange':pricerange,'postal_code':postal_code}
business_data = pd.DataFrame(data2)
此错误意味着您正在迭代 None 对象。
基本上data2[entry]['attributes']
不是列表,而是None
所以当你
if 'RestaurantsPriceRange2'in data2[entry]['attributes']:
您无法检查元素是否在列表中,因为它不是列表。
所以你必须先检查它是否是一个列表。
正确的行是
if data2[entry]['attributes'] and
'RestaurantsPriceRange2'in data2[entry]['attributes']:
终于找到问题了。您的代码中断了索引为 21 的业务。
if 'RestaurantsPriceRange2' in data2[entry]['attributes']:
该语句检查 "restaurantsPriceRange2" 是否在数组 data2[entry]['attributes'] 中,但在业务 21 中 ['attributes'] 的值是 None,我觉得是因为商家没有属性。
print data2[21]
{u'city': u'Cleveland', u'neighborhood': u'Central', u'name': u"Rally's Hamburgers", u'business_id': u'gJ5xSt6147gkcZ9Es0WxlA', u'longitude': -81.6663746, u'hours': None, u'state': u'OH', u'postal_code': u'44115', u'categories': u'Fast Food, Burgers, Restaurants', u'stars': 3.0, u'address': u'3040 Carnegie Ave', u'latitude': 41.4999894, u'review_count': 5, u'attributes': None, u'is_open': 1}
因此您可以使用 If 处理错误,检查 data2[entry]['attributes'] 是否是 none.
if data2[entry]['attributes'] != None:
测试我还发现如果 "Restaurants" in data2[entry]["categories"]: 在某些业务中给出相同的错误所以,整个代码将如下所示:
import json;
data2 = []
with open('yelp_academic_dataset_business.json') as f:
for line in f:
data2.append(json.loads(line))
len(data2)
business_id = []
city = []
state = []
stars = []
review_count = []
categories = []
postal_code = []
latitude = []
longitude = []
pricerange = []
is_open = []
name = []
for entry in range(0, len(data2)):
if data2[entry]["categories"] != None:
if "Restaurants" in data2[entry]["categories"]:
business_id.append(data2[entry]['business_id'])
name.append(data2[entry]['name'])
city.append(data2[entry]['city'])
state.append(data2[entry]['state'])
stars.append(data2[entry]['stars'])
postal_code.append(data2[entry]['postal_code'])
review_count.append(data2[entry]['review_count'])
categories.append(data2[entry]['categories'])
latitude.append(data2[entry]['latitude'])
longitude.append(data2[entry]['longitude'])
is_open.append(data2[entry]['is_open'])
if data2[entry]['attributes'] != None:
if 'RestaurantsPriceRange2' in data2[entry]['attributes']:
pricerange.append(data2[entry]['attributes']['RestaurantsPriceRange2'])
else:
pricerange.append(0)
data2 = {'business_id ':business_id,'name':name,'city':city,'state':state,'stars':stars,'review_count':review_count,'categories':categories,'latitude':latitude,'longitude':longitude,'is_open':is_open,'pricerange':pricerange,'postal_code':postal_code}
请记住,当您阅读 json 时,您需要注意空值或空数组,因此您总是想检查该值是否存在,以免程序崩溃。
1如何从 yelp_academic_dataset_business.json 创建另一个仅包含酒店、餐厅或这两个类别的业务的 csv 文件?
原始 yelp 业务数据集包含仅包含牙医、美发沙龙等的行。我只想 select 酒店、饭店和两者的业务。
我刚开始学习 python,正在学习机器学习实验教程。下面的代码给我一个错误。我已经用谷歌搜索并阅读了很多,但仍然不明白。 任何帮助将不胜感激。
Image of what the yelp_academic_dataset_business.csv look like
image of code and error message
data2 = []
with open('yelp_academic_dataset_business.json') as f:
for line in f:
data2.append(json.loads(line))
len(data2)
business_id = []
city = []
state = []
stars = []
review_count = []
categories = []
postal_code = []
latitude = []
longitude = []
pricerange = []
is_open = []
name = []
for entry in range(0, len(data2)):
if "Restaurants" in data2[entry]["categories"]:
business_id.append(data2[entry]['business_id'])
name.append(data2[entry]['name'])
city.append(data2[entry]['city'])
state.append(data2[entry]['state'])
stars.append(data2[entry]['stars'])
postal_code.append(data2[entry]['postal_code'])
review_count.append(data2[entry]['review_count'])
categories.append(data2[entry]['categories'])
latitude.append(data2[entry]['latitude'])
longitude.append(data2[entry]['longitude'])
is_open.append(data2[entry]['is_open'])
if 'RestaurantsPriceRange2'in data2[entry]['attributes']:
pricerange.append(data2[entry]['attributes']['RestaurantsPriceRange2'])
else:
pricerange.append(0)
data2 = {'business_id ':business_id,'name':name,'city':city,'state':state,'stars':stars,'review_count':review_count,
'categories':categories,'latitude':latitude,'longitude':longitude,'is_open':is_open,'pricerange':pricerange,'postal_code':postal_code}
business_data = pd.DataFrame(data2)
此错误意味着您正在迭代 None 对象。
基本上data2[entry]['attributes']
不是列表,而是None
所以当你
if 'RestaurantsPriceRange2'in data2[entry]['attributes']:
您无法检查元素是否在列表中,因为它不是列表。
所以你必须先检查它是否是一个列表。 正确的行是
if data2[entry]['attributes'] and
'RestaurantsPriceRange2'in data2[entry]['attributes']:
终于找到问题了。您的代码中断了索引为 21 的业务。
if 'RestaurantsPriceRange2' in data2[entry]['attributes']:
该语句检查 "restaurantsPriceRange2" 是否在数组 data2[entry]['attributes'] 中,但在业务 21 中 ['attributes'] 的值是 None,我觉得是因为商家没有属性。
print data2[21]
{u'city': u'Cleveland', u'neighborhood': u'Central', u'name': u"Rally's Hamburgers", u'business_id': u'gJ5xSt6147gkcZ9Es0WxlA', u'longitude': -81.6663746, u'hours': None, u'state': u'OH', u'postal_code': u'44115', u'categories': u'Fast Food, Burgers, Restaurants', u'stars': 3.0, u'address': u'3040 Carnegie Ave', u'latitude': 41.4999894, u'review_count': 5, u'attributes': None, u'is_open': 1}
因此您可以使用 If 处理错误,检查 data2[entry]['attributes'] 是否是 none.
if data2[entry]['attributes'] != None:
测试我还发现如果 "Restaurants" in data2[entry]["categories"]: 在某些业务中给出相同的错误所以,整个代码将如下所示:
import json;
data2 = []
with open('yelp_academic_dataset_business.json') as f:
for line in f:
data2.append(json.loads(line))
len(data2)
business_id = []
city = []
state = []
stars = []
review_count = []
categories = []
postal_code = []
latitude = []
longitude = []
pricerange = []
is_open = []
name = []
for entry in range(0, len(data2)):
if data2[entry]["categories"] != None:
if "Restaurants" in data2[entry]["categories"]:
business_id.append(data2[entry]['business_id'])
name.append(data2[entry]['name'])
city.append(data2[entry]['city'])
state.append(data2[entry]['state'])
stars.append(data2[entry]['stars'])
postal_code.append(data2[entry]['postal_code'])
review_count.append(data2[entry]['review_count'])
categories.append(data2[entry]['categories'])
latitude.append(data2[entry]['latitude'])
longitude.append(data2[entry]['longitude'])
is_open.append(data2[entry]['is_open'])
if data2[entry]['attributes'] != None:
if 'RestaurantsPriceRange2' in data2[entry]['attributes']:
pricerange.append(data2[entry]['attributes']['RestaurantsPriceRange2'])
else:
pricerange.append(0)
data2 = {'business_id ':business_id,'name':name,'city':city,'state':state,'stars':stars,'review_count':review_count,'categories':categories,'latitude':latitude,'longitude':longitude,'is_open':is_open,'pricerange':pricerange,'postal_code':postal_code}
请记住,当您阅读 json 时,您需要注意空值或空数组,因此您总是想检查该值是否存在,以免程序崩溃。