我需要从 table 中包含的页面中删除特定值
I need to web scrap a particular value from a page which is contained in a table
我想从网页中删除 table 中包含的 2021 年 12 月的净销售额。我使用的是简单的 beautifulsoup module.I 已经包含了 python 我用来提取一些其他 values.I 想要提取值 9644.8 的代码。网页代码如下
<table class="table table-sm table-hover screenertable table-responsive-sm">
<thead>
<tr>
<th scope="col">PARTICULARS</th>
<th scope="col">Dec 2020</th>
<th scope="col">Mar 2021</th>
<th scope="col">Jun 2021</th>
<th scope="col">Sep 2021</th>
<th scope="col">Dec 2021</th>
</tr>
</thead>
<tbody>
<tr class="">
<th scope="row">Net Sales <span class="infolink" data-tooltip="tooltip" title=""
data-original-title="It is companys core revenue net of discounts and returns."><svg
class="svg-inline--fa fa-info-circle fa-w-16" aria-hidden="true" focusable="false" data-prefix="fas"
data-icon="info-circle" role="img" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"
data-fa-i2svg=""><path fill="currentColor"
d="M256 8C119.043 8 8 119.083 8 256c0 136.997 111.043 248 248 248s248-111.003 248-248C504 119.083 392.957 8 256 8zm0 110c23.196 0 42 18.804 42 42s-18.804 42-42 42-42-18.804-42-42 18.804-42 42-42zm56 254c0 6.627-5.373 12-12 12h-88c-6.627 0-12-5.373-12-12v-24c0-6.627 5.373-12 12-12h12v-64h-12c-6.627 0-12-5.373-12-12v-24c0-6.627 5.373-12 12-12h64c6.627 0 12 5.373 12 12v100h12c6.627 0 12 5.373 12 12v24z"></path></svg>
<!-- <i class="fas fa-info-circle"></i> --></span></th>
<td>
<span class="Number" value="10824.4">10,824.40</span>
</td>
<td>
<span class="Number" value="9530.9">9,530.90</span>
</td>
<td>
<span class="Number" value="9088.2">9,088.20</span>
</td>
<td>
<span class="Number" value="9321.5">9,321.50</span>
</td>
<td>
<span class="Number" value="9644.8">9,644.80</span>
</td>
</tr>
</tbody>
</table>
import pandas as pd
from bs4 import BeautifulSoup
import requests
a=input("Enter symbol of the company\n")
url="https://ticker.finology.in/company/"+a
print(url)
response=requests.get(url)
soup=BeautifulSoup(response.text,'html.parser')
CP = soup.find("div", {"id":"mainContent_clsprice"}).find("span", {"class": "Number"}).getText()
我用的代码是IDEA
您可以将 URL 直接传递给 pandas.read_html()
,这将 return 找到表的数据帧列表。
>>> dfs = pd.read_html('https://ticker.finology.in/company/idea')
>>> dfs[1]
PARTICULARS Dec 2020 Mar 2021 Jun 2021 Sep 2021 Dec 2021
0 Net Sales 10824.40 9530.90 9088.20 9321.5 9644.8
1 Total Expenditure 6717.40 5296.20 5529.30 5672.9 5995.2
2 Operating Profit 4107.00 4234.70 3558.90 3648.6 3649.6
3 Other Income 36.30 31.10 29.20 22.8 25.2
4 Interest 4782.60 4711.00 5223.20 5112.8 5324.7
5 Depreciation 5638.90 5629.50 5831.90 5743.8 5550.5
6 Exceptional Items -439.50 -972.60 51.30 13.5 11.6
7 Profit Before Tax -6717.70 -7047.30 -7415.70 -7171.7 -7188.8
8 Tax 0.00 -20.80 0.00 0.0 0.0
9 Profit After Tax -6717.70 -7026.50 -7415.70 -7171.7 -7188.8
10 Adjusted EPS (Rs) -2.34 -2.45 -2.58 -2.5 -2.5
只要您找到了正确的 table,您就可以使用 select_one
和 css 选择器 tbody td:last-child > .Number
。 tbody
将限制为 table 正文行,select_one
和 td:last-child
将获取最后一列,然后 > .Number
将获取直接子行 span
.您当然可以组合并缩短选择器列表,而不是单独获取 table,但是,我不确定您是否会用 table.
做其他事情
from bs4 import BeautifulSoup as bs
import requests
r = requests.get('https://ticker.finology.in/company/idea', headers = {'User-Agent':'Mozilla/5.0'})
soup = bs(r.content, 'lxml')
table = soup.select_one('.card:has(> #mainContent_hfSizeForQuarter) table')
table.select_one('tbody td:last-child > .Number').text.strip()
我想从网页中删除 table 中包含的 2021 年 12 月的净销售额。我使用的是简单的 beautifulsoup module.I 已经包含了 python 我用来提取一些其他 values.I 想要提取值 9644.8 的代码。网页代码如下
<table class="table table-sm table-hover screenertable table-responsive-sm">
<thead>
<tr>
<th scope="col">PARTICULARS</th>
<th scope="col">Dec 2020</th>
<th scope="col">Mar 2021</th>
<th scope="col">Jun 2021</th>
<th scope="col">Sep 2021</th>
<th scope="col">Dec 2021</th>
</tr>
</thead>
<tbody>
<tr class="">
<th scope="row">Net Sales <span class="infolink" data-tooltip="tooltip" title=""
data-original-title="It is companys core revenue net of discounts and returns."><svg
class="svg-inline--fa fa-info-circle fa-w-16" aria-hidden="true" focusable="false" data-prefix="fas"
data-icon="info-circle" role="img" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"
data-fa-i2svg=""><path fill="currentColor"
d="M256 8C119.043 8 8 119.083 8 256c0 136.997 111.043 248 248 248s248-111.003 248-248C504 119.083 392.957 8 256 8zm0 110c23.196 0 42 18.804 42 42s-18.804 42-42 42-42-18.804-42-42 18.804-42 42-42zm56 254c0 6.627-5.373 12-12 12h-88c-6.627 0-12-5.373-12-12v-24c0-6.627 5.373-12 12-12h12v-64h-12c-6.627 0-12-5.373-12-12v-24c0-6.627 5.373-12 12-12h64c6.627 0 12 5.373 12 12v100h12c6.627 0 12 5.373 12 12v24z"></path></svg>
<!-- <i class="fas fa-info-circle"></i> --></span></th>
<td>
<span class="Number" value="10824.4">10,824.40</span>
</td>
<td>
<span class="Number" value="9530.9">9,530.90</span>
</td>
<td>
<span class="Number" value="9088.2">9,088.20</span>
</td>
<td>
<span class="Number" value="9321.5">9,321.50</span>
</td>
<td>
<span class="Number" value="9644.8">9,644.80</span>
</td>
</tr>
</tbody>
</table>
import pandas as pd
from bs4 import BeautifulSoup
import requests
a=input("Enter symbol of the company\n")
url="https://ticker.finology.in/company/"+a
print(url)
response=requests.get(url)
soup=BeautifulSoup(response.text,'html.parser')
CP = soup.find("div", {"id":"mainContent_clsprice"}).find("span", {"class": "Number"}).getText()
我用的代码是IDEA
您可以将 URL 直接传递给 pandas.read_html()
,这将 return 找到表的数据帧列表。
>>> dfs = pd.read_html('https://ticker.finology.in/company/idea')
>>> dfs[1]
PARTICULARS Dec 2020 Mar 2021 Jun 2021 Sep 2021 Dec 2021
0 Net Sales 10824.40 9530.90 9088.20 9321.5 9644.8
1 Total Expenditure 6717.40 5296.20 5529.30 5672.9 5995.2
2 Operating Profit 4107.00 4234.70 3558.90 3648.6 3649.6
3 Other Income 36.30 31.10 29.20 22.8 25.2
4 Interest 4782.60 4711.00 5223.20 5112.8 5324.7
5 Depreciation 5638.90 5629.50 5831.90 5743.8 5550.5
6 Exceptional Items -439.50 -972.60 51.30 13.5 11.6
7 Profit Before Tax -6717.70 -7047.30 -7415.70 -7171.7 -7188.8
8 Tax 0.00 -20.80 0.00 0.0 0.0
9 Profit After Tax -6717.70 -7026.50 -7415.70 -7171.7 -7188.8
10 Adjusted EPS (Rs) -2.34 -2.45 -2.58 -2.5 -2.5
只要您找到了正确的 table,您就可以使用 select_one
和 css 选择器 tbody td:last-child > .Number
。 tbody
将限制为 table 正文行,select_one
和 td:last-child
将获取最后一列,然后 > .Number
将获取直接子行 span
.您当然可以组合并缩短选择器列表,而不是单独获取 table,但是,我不确定您是否会用 table.
from bs4 import BeautifulSoup as bs
import requests
r = requests.get('https://ticker.finology.in/company/idea', headers = {'User-Agent':'Mozilla/5.0'})
soup = bs(r.content, 'lxml')
table = soup.select_one('.card:has(> #mainContent_hfSizeForQuarter) table')
table.select_one('tbody td:last-child > .Number').text.strip()