使用 python 从交互式图表中提取数据点?
Extract datapoints from an interactive chart using python?
是否可以从此 link 中的图表中提取数据点?
https://ycharts.com/companies/AAPL/market_cap
图表位于//*[@id="dataChartCanvass1"]
不是图表下方的 table。
我试图查看该网站的源代码,但我只能看到 table 中的数据点。
是否可以使用 python 和请求?我应该从哪里开始?
您可以模拟他们的 Ajax 调用来获取图表代码点,例如:
import json
import requests
import pandas as pd
api_url = "https://ycharts.com/charts/fund_data.json"
params = {
"securities": "id:AAPL,include:true,,", # <-- ticker here
"calcs": "id:market_cap,include:true,,",
"correlations": "",
"format": "real",
"recessions": "false",
"zoom": "5",
"startDate": "",
"endDate": "",
"chartView": "",
"splitType": "single",
"scaleType": "linear",
"note": "",
"title": "",
"source": "false",
"units": "false",
"quoteLegend": "true",
"partner": "",
"quotes": "",
"legendOnChart": "true",
"securitylistSecurityId": "",
"displayTicker": "false",
"ychartsLogo": "",
"useEstimates": "false",
"maxPoints": "918",
}
data = requests.get(api_url, params=params).json()
# uncomment to see all data:
# print(json.dumps(data, indent=4))
df = pd.DataFrame(
data["chart_data"][0][0]["raw_data"], columns=["date", "value"]
)
df["date"] = pd.to_datetime(df["date"] / 1000, unit="s")
df["value"] = df["value"].astype(int)
print(df)
打印:
date value
0 2016-08-29 575593
1 2016-09-06 580335
2 2016-09-09 555710
3 2016-09-16 619239
4 2016-09-23 607331
5 2016-09-30 603253
6 2016-10-07 608643
7 2016-10-14 627239
8 2016-10-21 621747
9 2016-10-28 606390
10 2016-11-04 580368
11 2016-11-11 578182
...and so on.
是否可以从此 link 中的图表中提取数据点?
https://ycharts.com/companies/AAPL/market_cap
图表位于//*[@id="dataChartCanvass1"]
不是图表下方的 table。
我试图查看该网站的源代码,但我只能看到 table 中的数据点。
是否可以使用 python 和请求?我应该从哪里开始?
您可以模拟他们的 Ajax 调用来获取图表代码点,例如:
import json
import requests
import pandas as pd
api_url = "https://ycharts.com/charts/fund_data.json"
params = {
"securities": "id:AAPL,include:true,,", # <-- ticker here
"calcs": "id:market_cap,include:true,,",
"correlations": "",
"format": "real",
"recessions": "false",
"zoom": "5",
"startDate": "",
"endDate": "",
"chartView": "",
"splitType": "single",
"scaleType": "linear",
"note": "",
"title": "",
"source": "false",
"units": "false",
"quoteLegend": "true",
"partner": "",
"quotes": "",
"legendOnChart": "true",
"securitylistSecurityId": "",
"displayTicker": "false",
"ychartsLogo": "",
"useEstimates": "false",
"maxPoints": "918",
}
data = requests.get(api_url, params=params).json()
# uncomment to see all data:
# print(json.dumps(data, indent=4))
df = pd.DataFrame(
data["chart_data"][0][0]["raw_data"], columns=["date", "value"]
)
df["date"] = pd.to_datetime(df["date"] / 1000, unit="s")
df["value"] = df["value"].astype(int)
print(df)
打印:
date value
0 2016-08-29 575593
1 2016-09-06 580335
2 2016-09-09 555710
3 2016-09-16 619239
4 2016-09-23 607331
5 2016-09-30 603253
6 2016-10-07 608643
7 2016-10-14 627239
8 2016-10-21 621747
9 2016-10-28 606390
10 2016-11-04 580368
11 2016-11-11 578182
...and so on.