通过 for 循环找到 html table 数据,它通过 pandas (python 3) 中的多个列表
find html table data through for loop work it's way through multiple lists in pandas (python 3)
您好,我有一个 html,我正在尝试通过多个列表收集公寓租金数据。
这是我正在尝试做的事情:
locations = ['Dallas-TX', 'Denver-CO', 'Tampa-FL']
types = ['Apartments', 'Townhomes']
bedrooms = ['studios', '1-bedroom', '2-bedroom','3-bedroom','4-bedrooms']
dfs = []
for typ, location, bedroom in x(types, locations, bedrooms):
df = pd.read_html('https://www.apartments.com/{typ}/{location}/studios/'.format(typ = typ, location = location, bedroom = bedroom))[-1]
df['location'] = location & type & bedroom
dfs.append(df)
df = pd.concat(dfs). reset_index(drop=True)
df
我想要的是它循环遍历列表,这样会给我一个输出,例如:
Dallas-TX Apartment Studio price
Dallas-TX Apartment 1-bedroom price
Dallas-TX Apartment 2-bedroom price
Dallas-TX Apartment 3-bedroom price
Dallas-TX Apartment 4-bedroom price
Dallas-TX townhome studio price
Dallas-TX Townhome 1-bedroom price
Dallas-TX townhome 2-bedroom price
Dallas-TX townhome 3-bedroom price
Dallas-TX townhome 4-bedroom price
Denver-CO Apartment Studio Price
Denver-CO Apartment 1-Bedroom price
Denver-CO Apartment 2-Bedroom price
and so on:
我想不出有什么方法可以做到这一点,但我觉得 Pandas 是可行的方法,但我也考虑过 BeautifulSoup 来收集数据,但我卡住了,因为我在这个问题上走了很多路。
有没有人有任何有用的见解。我想我可能想多了这里的代码。
提前致谢,祝您有愉快的一天!
- 您可以构建列表的笛卡尔积
- 您提供的 URL 没有返回,因此注释掉了获取 HTML
的代码
locations = ['Dallas-TX', 'Denver-CO', 'Tampa-FL']
types = ['Apartments', 'Townhomes']
bedrooms = ['studios', '1-bedroom', '2-bedroom','3-bedroom','4-bedrooms']
dfs = []
for c in pd.merge(pd.merge(pd.DataFrame(locations, columns=["locations"]).assign(foo=1),
pd.DataFrame(types, columns=["types"]).assign(foo=1), on="foo"),
pd.DataFrame(bedrooms, columns=["bedrooms"]).assign(foo=1), on="foo").drop(columns="foo").values:
print(c)
# df = pd.read_html('https://www.apartments.com/{typ}/{location}/{bedroom}/'.format(typ = c[1], location = c[0], bedroom = c[2]))[-1]
# df['location'] = "-".join(c)
# dfs.append(df)
df = pd.concat(dfs). reset_index(drop=True)
您好,我有一个 html,我正在尝试通过多个列表收集公寓租金数据。
这是我正在尝试做的事情:
locations = ['Dallas-TX', 'Denver-CO', 'Tampa-FL']
types = ['Apartments', 'Townhomes']
bedrooms = ['studios', '1-bedroom', '2-bedroom','3-bedroom','4-bedrooms']
dfs = []
for typ, location, bedroom in x(types, locations, bedrooms):
df = pd.read_html('https://www.apartments.com/{typ}/{location}/studios/'.format(typ = typ, location = location, bedroom = bedroom))[-1]
df['location'] = location & type & bedroom
dfs.append(df)
df = pd.concat(dfs). reset_index(drop=True)
df
我想要的是它循环遍历列表,这样会给我一个输出,例如:
Dallas-TX Apartment Studio price
Dallas-TX Apartment 1-bedroom price
Dallas-TX Apartment 2-bedroom price
Dallas-TX Apartment 3-bedroom price
Dallas-TX Apartment 4-bedroom price
Dallas-TX townhome studio price
Dallas-TX Townhome 1-bedroom price
Dallas-TX townhome 2-bedroom price
Dallas-TX townhome 3-bedroom price
Dallas-TX townhome 4-bedroom price
Denver-CO Apartment Studio Price
Denver-CO Apartment 1-Bedroom price
Denver-CO Apartment 2-Bedroom price
and so on:
我想不出有什么方法可以做到这一点,但我觉得 Pandas 是可行的方法,但我也考虑过 BeautifulSoup 来收集数据,但我卡住了,因为我在这个问题上走了很多路。
有没有人有任何有用的见解。我想我可能想多了这里的代码。
提前致谢,祝您有愉快的一天!
- 您可以构建列表的笛卡尔积
- 您提供的 URL 没有返回,因此注释掉了获取 HTML 的代码
locations = ['Dallas-TX', 'Denver-CO', 'Tampa-FL']
types = ['Apartments', 'Townhomes']
bedrooms = ['studios', '1-bedroom', '2-bedroom','3-bedroom','4-bedrooms']
dfs = []
for c in pd.merge(pd.merge(pd.DataFrame(locations, columns=["locations"]).assign(foo=1),
pd.DataFrame(types, columns=["types"]).assign(foo=1), on="foo"),
pd.DataFrame(bedrooms, columns=["bedrooms"]).assign(foo=1), on="foo").drop(columns="foo").values:
print(c)
# df = pd.read_html('https://www.apartments.com/{typ}/{location}/{bedroom}/'.format(typ = c[1], location = c[0], bedroom = c[2]))[-1]
# df['location'] = "-".join(c)
# dfs.append(df)
df = pd.concat(dfs). reset_index(drop=True)