核心报告 API v3 - 从特定日期采样的数据,但不早于该日期
Core Reporting API v3 - Data sampled from specific date, but not before that date
我有一个 Google Analytics 帐户,其视图创建于 2015-07-29
。
以 2015-07-29
作为开始日期向核心报告 api 提出请求:
https://www.googleapis.com/analytics/v3/data/ga?ids=<my-ga-id>&dimensions=ga:medium,ga:year,ga:month,ga:channelGrouping&metrics=ga:transactions&start-date=2015-07-29&end-date=2017-03-30&max-results=10000
我收到以下回复:
{
...
"containsSampledData": true,
"sampleSize": "498617",
"sampleSpace": "1022430",
...
}
这非常有道理 - 由于会话数,它正在对数据进行采样。
但是,如果我将我的请求更改为核心报告 api,那么现在 2015-07-28
就是 start-date
:
https://www.googleapis.com/analytics/v3/data/ga?ids=<my-ga-id>&dimensions=ga:medium,ga:year,ga:month,ga:channelGrouping&metrics=ga:transactions&start-date=2015-07-28&end-date=2017-03-30&max-results=10000
我收到以下回复:
{
...
"containsSampledData": false
...
}
数据不再抽样,并产生正确的值(与 Google Analytics Web UI 相比)。
如果然后使用 start-date=2015-07-28
将指标 ga:sessions
添加到请求中,我将获得采样数据。
我的问题是:
如果 start-date
等于或晚于创建 Google 分析视图的日期,为什么要对数据进行采样? - 如果是在那个日期之前,数据就不再采样了? - 但是一旦我输入指标 ga:sessions
?
就会对其进行采样
In data analysis, sampling is the practice of analyzing a subset of
all the data in order to uncover the meaningful information in the
larger data set. For example, during an election cycle, you hear lots
of news about what percent of voters prefer one candidate over
another, or are for or against a certain issue. Because there can be
tens to hundreds of millions of voters in an election, and because the
companies conducting the surveys want to get their information out to
the public as soon as possible, trying to question every voter for
every new survey would be extraordinarily expensive and take too much
time. To solve those problems, surveyors use what they conclude is a
representative sample of the overall voter population, often just 1000
voters from the millions who are eligible.
基本上是在返回数据量大的时候进行数据采样。 Google 如何计算/确定何时应该对请求进行采样是只有 Google 才能回答的问题。我相信这个问题是基于主要意见的,这是我的意见。
Google 估计您的请求返回的行数,将其除以请求中给出 Y 的天数。如果 Y 大于 X,他们将抽样。通过在您实际开始记录任何数据之前添加日期,您可以诱使系统减小 Y 的大小,从而不进行采样。
这又是我的一个疯狂猜测。我可能会测试一下,这听起来像是一种欺骗系统的有趣方式。
我有一个 Google Analytics 帐户,其视图创建于 2015-07-29
。
以 2015-07-29
作为开始日期向核心报告 api 提出请求:
https://www.googleapis.com/analytics/v3/data/ga?ids=<my-ga-id>&dimensions=ga:medium,ga:year,ga:month,ga:channelGrouping&metrics=ga:transactions&start-date=2015-07-29&end-date=2017-03-30&max-results=10000
我收到以下回复:
{
...
"containsSampledData": true,
"sampleSize": "498617",
"sampleSpace": "1022430",
...
}
这非常有道理 - 由于会话数,它正在对数据进行采样。
但是,如果我将我的请求更改为核心报告 api,那么现在 2015-07-28
就是 start-date
:
https://www.googleapis.com/analytics/v3/data/ga?ids=<my-ga-id>&dimensions=ga:medium,ga:year,ga:month,ga:channelGrouping&metrics=ga:transactions&start-date=2015-07-28&end-date=2017-03-30&max-results=10000
我收到以下回复:
{
...
"containsSampledData": false
...
}
数据不再抽样,并产生正确的值(与 Google Analytics Web UI 相比)。
如果然后使用 start-date=2015-07-28
将指标 ga:sessions
添加到请求中,我将获得采样数据。
我的问题是:
如果 start-date
等于或晚于创建 Google 分析视图的日期,为什么要对数据进行采样? - 如果是在那个日期之前,数据就不再采样了? - 但是一旦我输入指标 ga:sessions
?
In data analysis, sampling is the practice of analyzing a subset of all the data in order to uncover the meaningful information in the larger data set. For example, during an election cycle, you hear lots of news about what percent of voters prefer one candidate over another, or are for or against a certain issue. Because there can be tens to hundreds of millions of voters in an election, and because the companies conducting the surveys want to get their information out to the public as soon as possible, trying to question every voter for every new survey would be extraordinarily expensive and take too much time. To solve those problems, surveyors use what they conclude is a representative sample of the overall voter population, often just 1000 voters from the millions who are eligible.
基本上是在返回数据量大的时候进行数据采样。 Google 如何计算/确定何时应该对请求进行采样是只有 Google 才能回答的问题。我相信这个问题是基于主要意见的,这是我的意见。
Google 估计您的请求返回的行数,将其除以请求中给出 Y 的天数。如果 Y 大于 X,他们将抽样。通过在您实际开始记录任何数据之前添加日期,您可以诱使系统减小 Y 的大小,从而不进行采样。
这又是我的一个疯狂猜测。我可能会测试一下,这听起来像是一种欺骗系统的有趣方式。