如何使用 Google 分析报告 API 将多个维度和指标转换为 Pandas 数据框?
How do you convert multiple Dimensions and Metrics into a Pandas Dataframe using the Google Analytics Reporting API?
我无法将来自 Google Analytics Reporting API 的数据合并到 Pandas Dataframe 中。尽管请求和数据收集都很好,但每当我获得多个维度和指标时,我都无法将其放入 Pandas DataFrame。
输出的是我们所有产品的超长列表。它以维度(产品名称和 SKU)开始,然后在列表中传递指标(收入和数量)示例:
['PRODUCT1', '1234', 'PRODUCT2', '5678'..... 13.0, 324.0, 3.0, 322.0]
我在转换为 DF 时 运行 遇到的错误:
ValueError: Length of values does not match length of index
和
"None of [Index(['Product', 'SKU', 'Revenue', 'Quantity'], dtype='object')] are in the [columns]"
关于如何将其放入适当的数据帧中的任何想法?我以这篇文章作为开始,但它只解释了如何导出 1 个维度和 1 个指标:https://www.jcchouinard.com/google-analytics-api-using-python/
我的代码:
from apiclient.discovery import build
from oauth2client.service_account import ServiceAccountCredentials
import pandas as pd
SCOPES = ['https://www.googleapis.com/auth/analytics.readonly']
KEY_FILE_LOCATION = 'client_secrets.json'
VIEW_ID = '123456789' #Random number
credentials = ServiceAccountCredentials.from_json_keyfile_name(KEY_FILE_LOCATION, SCOPES)
analytics = build('analyticsreporting', 'v4', credentials=credentials)
response = analytics.reports().batchGet(
body={
'reportRequests': [
{
'viewId': VIEW_ID, #Add View ID from GA
'dateRanges': [
{'startDate': '30daysAgo', 'endDate': 'today'},
],
'metrics': [
{'expression': 'ga:itemRevenue'},
{'expression': 'ga:itemQuantity'}
],
'dimensions': [
{
"name": "ga:productName"
},{
"name": "ga:productSku"
}
],
#"filtersExpression":"ga:pagePath=~products;ga:pagePath!@/translate", #Filter by condition "containing products"
'orderBys': [{"fieldName": "ga:itemRevenue", "sortOrder": "DESCENDING"}],
'pageSize': 1000
}]
}
).execute()
dim = []
val = []
#Extract Data
for report in response.get('reports', []):
columnHeader = report.get('columnHeader', {})
dimensionHeaders = columnHeader.get('dimensions', [])
metricHeaders = columnHeader.get('metricHeader', {}).get('metricHeaderEntries', [])
rows = report.get('data', {}).get('rows', [])
for row in rows:
dimensions = row.get('dimensions', [])
dateRangeValues = row.get('metrics', [])
for header, dimension in zip(dimensionHeaders, dimensions):
dim.append(dimension)
for i, values in enumerate(dateRangeValues):
for metricHeader, value in zip(metricHeaders, values.get('values')):
val.append(float(value))
df = pd.DataFrame()
df["Revenue", "Quantity"]=val
df["Product", "SKU" ]=dim
df=df[["Product", "SKU","Revenue", "Quantity"]]
print(df)
我已经使用 gapandas
包解决了这个问题。
此 Google 分析 API 对于这些操作来说效果不是很好。改用这个:
我无法将来自 Google Analytics Reporting API 的数据合并到 Pandas Dataframe 中。尽管请求和数据收集都很好,但每当我获得多个维度和指标时,我都无法将其放入 Pandas DataFrame。
输出的是我们所有产品的超长列表。它以维度(产品名称和 SKU)开始,然后在列表中传递指标(收入和数量)示例:
['PRODUCT1', '1234', 'PRODUCT2', '5678'..... 13.0, 324.0, 3.0, 322.0]
我在转换为 DF 时 运行 遇到的错误:
ValueError: Length of values does not match length of index
和
"None of [Index(['Product', 'SKU', 'Revenue', 'Quantity'], dtype='object')] are in the [columns]"
关于如何将其放入适当的数据帧中的任何想法?我以这篇文章作为开始,但它只解释了如何导出 1 个维度和 1 个指标:https://www.jcchouinard.com/google-analytics-api-using-python/
我的代码:
from apiclient.discovery import build
from oauth2client.service_account import ServiceAccountCredentials
import pandas as pd
SCOPES = ['https://www.googleapis.com/auth/analytics.readonly']
KEY_FILE_LOCATION = 'client_secrets.json'
VIEW_ID = '123456789' #Random number
credentials = ServiceAccountCredentials.from_json_keyfile_name(KEY_FILE_LOCATION, SCOPES)
analytics = build('analyticsreporting', 'v4', credentials=credentials)
response = analytics.reports().batchGet(
body={
'reportRequests': [
{
'viewId': VIEW_ID, #Add View ID from GA
'dateRanges': [
{'startDate': '30daysAgo', 'endDate': 'today'},
],
'metrics': [
{'expression': 'ga:itemRevenue'},
{'expression': 'ga:itemQuantity'}
],
'dimensions': [
{
"name": "ga:productName"
},{
"name": "ga:productSku"
}
],
#"filtersExpression":"ga:pagePath=~products;ga:pagePath!@/translate", #Filter by condition "containing products"
'orderBys': [{"fieldName": "ga:itemRevenue", "sortOrder": "DESCENDING"}],
'pageSize': 1000
}]
}
).execute()
dim = []
val = []
#Extract Data
for report in response.get('reports', []):
columnHeader = report.get('columnHeader', {})
dimensionHeaders = columnHeader.get('dimensions', [])
metricHeaders = columnHeader.get('metricHeader', {}).get('metricHeaderEntries', [])
rows = report.get('data', {}).get('rows', [])
for row in rows:
dimensions = row.get('dimensions', [])
dateRangeValues = row.get('metrics', [])
for header, dimension in zip(dimensionHeaders, dimensions):
dim.append(dimension)
for i, values in enumerate(dateRangeValues):
for metricHeader, value in zip(metricHeaders, values.get('values')):
val.append(float(value))
df = pd.DataFrame()
df["Revenue", "Quantity"]=val
df["Product", "SKU" ]=dim
df=df[["Product", "SKU","Revenue", "Quantity"]]
print(df)
我已经使用 gapandas
包解决了这个问题。
此 Google 分析 API 对于这些操作来说效果不是很好。改用这个: