Github Big Query 存档中缺少数据?
Missing data in Github Archive on Big Query?
Big Query 上的 Github 存档中缺少数据?
使用 BigQuery's tables from the Github Archive, and running a query on pull-requests for the typelevel/cats repo,2016 年 1 月 1 日之前没有条目,尽管实际回购显示 activity 从 2015 年 1 月 28 日开始。
Link to github repo showing earlier pull requests
查询如下。想检查一下,看看这是不是我的错误或误解,或者是否有一些回购协议在 BQ 表中仅部分可用。
SELECT
DATE(created_at) AS date, repo.name, count(*) AS num_PR
FROM
(TABLE_DATE_RANGE([githubarchive:day.],
TIMESTAMP('2014-09-26'),
TIMESTAMP('2016-09-26')
))
WHERE
type = 'PullRequestEvent'
AND JSON_EXTRACT(payload, '$.action') = '\"opened\"'
AND repo.name IN ('typelevel/cats')
GROUP BY date, repo.name
ORDER BY date DESC
此存储库更改了名称,但 ID 保持不变:
SELECT repo.name, MIN(created_at) since, MAX(created_at) until
FROM (TABLE_DATE_RANGE([githubarchive:day.],
TIMESTAMP('2015-01-01'),
TIMESTAMP('2016-10-01')
))
WHERE repo.id = 29986727
GROUP BY 1
ORDER BY 1
repo_name since until
non/cats 2015-01-28 20:26:49 2016-01-30 20:30:41
typelevel/cats 2016-01-30 20:32:30 2016-09-30 16:47:03
Big Query 上的 Github 存档中缺少数据?
使用 BigQuery's tables from the Github Archive, and running a query on pull-requests for the typelevel/cats repo,2016 年 1 月 1 日之前没有条目,尽管实际回购显示 activity 从 2015 年 1 月 28 日开始。
Link to github repo showing earlier pull requests
查询如下。想检查一下,看看这是不是我的错误或误解,或者是否有一些回购协议在 BQ 表中仅部分可用。
SELECT
DATE(created_at) AS date, repo.name, count(*) AS num_PR
FROM
(TABLE_DATE_RANGE([githubarchive:day.],
TIMESTAMP('2014-09-26'),
TIMESTAMP('2016-09-26')
))
WHERE
type = 'PullRequestEvent'
AND JSON_EXTRACT(payload, '$.action') = '\"opened\"'
AND repo.name IN ('typelevel/cats')
GROUP BY date, repo.name
ORDER BY date DESC
此存储库更改了名称,但 ID 保持不变:
SELECT repo.name, MIN(created_at) since, MAX(created_at) until
FROM (TABLE_DATE_RANGE([githubarchive:day.],
TIMESTAMP('2015-01-01'),
TIMESTAMP('2016-10-01')
))
WHERE repo.id = 29986727
GROUP BY 1
ORDER BY 1
repo_name since until
non/cats 2015-01-28 20:26:49 2016-01-30 20:30:41
typelevel/cats 2016-01-30 20:32:30 2016-09-30 16:47:03