通过 for 循环找到 html table 数据,它通过 pandas (python 3) 中的多个列表

find html table data through for loop work it's way through multiple lists in pandas (python 3)

您好,我有一个 html,我正在尝试通过多个列表收集公寓租金数据。

这是我正在尝试做的事情:

locations = ['Dallas-TX', 'Denver-CO', 'Tampa-FL']
types = ['Apartments', 'Townhomes']
bedrooms = ['studios', '1-bedroom', '2-bedroom','3-bedroom','4-bedrooms']

dfs = []
for typ, location, bedroom in x(types, locations, bedrooms):
    df = pd.read_html('https://www.apartments.com/{typ}/{location}/studios/'.format(typ = typ, location = location, bedroom = bedroom))[-1]
    df['location'] = location & type & bedroom
    dfs.append(df)
    
df = pd.concat(dfs). reset_index(drop=True)

df

我想要的是它循环遍历列表,这样会给我一个输出,例如:

Dallas-TX Apartment Studio    price
Dallas-TX Apartment 1-bedroom price
Dallas-TX Apartment 2-bedroom price
Dallas-TX Apartment 3-bedroom price
Dallas-TX Apartment 4-bedroom price
Dallas-TX townhome studio      price
Dallas-TX Townhome  1-bedroom price
Dallas-TX townhome 2-bedroom  price
Dallas-TX townhome 3-bedroom price
Dallas-TX townhome 4-bedroom price
Denver-CO Apartment Studio Price
Denver-CO Apartment 1-Bedroom price
Denver-CO Apartment 2-Bedroom price

and so on:

我想不出有什么方法可以做到这一点,但我觉得 Pandas 是可行的方法,但我也考虑过 BeautifulSoup 来收集数据,但我卡住了,因为我在这个问题上走了很多路。

有没有人有任何有用的见解。我想我可能想多了这里的代码。

提前致谢,祝您有愉快的一天!

  • 您可以构建列表的笛卡尔积
  • 您提供的 URL 没有返回,因此注释掉了获取 HTML
  • 的代码
locations = ['Dallas-TX', 'Denver-CO', 'Tampa-FL']
types = ['Apartments', 'Townhomes']
bedrooms = ['studios', '1-bedroom', '2-bedroom','3-bedroom','4-bedrooms']

dfs = []
for c in pd.merge(pd.merge(pd.DataFrame(locations, columns=["locations"]).assign(foo=1), 
         pd.DataFrame(types, columns=["types"]).assign(foo=1), on="foo"),
         pd.DataFrame(bedrooms, columns=["bedrooms"]).assign(foo=1), on="foo").drop(columns="foo").values:
    print(c)
#     df = pd.read_html('https://www.apartments.com/{typ}/{location}/{bedroom}/'.format(typ = c[1], location = c[0], bedroom = c[2]))[-1]
#     df['location'] = "-".join(c)
#     dfs.append(df)
    
df = pd.concat(dfs). reset_index(drop=True)