如何从 python 上的网站解析特定 HTML table

Question

我是使用 python 进行网络抓取的初学者。我正在尝试解析兰卡威岛的 table 礼拜场所。这是我指的网站http://www.jaik.gov.my/?page_id=658

我在 python 中输入了以下内容：-

import requests

import lxml.html as lh

import pandas as pd

langkawi_url = 'http://www.jaik.gov.my/?page_id=658'

page = requests.get(langkawi_url)

doc = lh.fromstring(page.content)

tr_elements = doc.xpath('//td')

[len(T) for T in tr_elements[:12]]

tr_elements = doc.xpath('//tr')

col = []
i = 0

for t in tr_elements[0]:
    i+=1
    name=t.text_content()
    print("%d:%s" % (i,name))
    col.append((name,[]))

显然我得到的输出是这样的：-

1:Sun
2:Mon
3:Tue
4:Wed
5:Thu
6:Fri
7:Sat

我希望得到这个：-

1:BIL
2:KARIAH MASJID
3:ALAMAT
4:MUKIM

非常感谢您的建议和指导。

谢谢！

Answer 1

尝试将您的代码更改为：

tr_elements = doc.xpath('//td/strong')
col = []
for t in tr_elements:
    col.append(t.text)
print(col)

输出：

['BIL', 'KARIAH MASJID', 'ALAMAT', 'MUKIM']

如何从 python 上的网站解析特定 HTML table

How to parse specific HTML table from website on python

python

html-parsing

web-scraping