将 Google 数据存储备份从数据存储加载到 Google BigQuery
Load Google Datastore Backups from Data Storage to Google BigQuery
我们的要求是以编程方式备份 Google 数据存储并将这些备份加载到 Google 大查询以供进一步分析。我们使用以下方法成功实现了自动备份
Queue queue = QueueFactory.getQueue("datastoreBackupQueue");
/*
* Create a task which is equivalent to the backup URL mentioned in
* above cron.xml, using new queue which has Datastore admin enabled
*/
TaskOptions taskOptions = TaskOptions.Builder.withUrl("/_ah/datastore_admin/backup.create")
.method(TaskOptions.Method.GET).param("name", "").param("filesystem", "gs")
.param("gs_bucket_name",
"db-backup" + "/" + TimeUtils.parseDateToString(new Date(), "yyyy/MMM/dd"))
.param("queue", queue.getQueueName());
/*
* Get list of dynamic entity kind names from the datastore based on
* the kinds present in the datastore at the start of backup
*/
List<String> entityNames = getEntityNamesForBackup();
for (String entityName : entityNames) {
taskOptions.param("kind", entityName);
}
/* Add this task to above queue */
queue.add(taskOptions);
然后我能够手动将此备份导入 Google Bigquery,但是我们如何自动执行此过程?
我也查看了大部分文档,但没有任何帮助
https://cloud.google.com/bigquery/docs/loading-data-cloud-storage#loading_data_from_google_cloud_storage
关于您在问题中提到的 loading data from Google Cloud Storage article,描述了一些使用命令行 Node.JS 或 Python.
从 GCS 导入的编程示例
您还可以通过 运行 脚本中的以下命令自动将位于云存储上的数据导入 BigQuery:
$ gcloud alpha bigquery import SOURCE DESTINATION_TABLE
有关此命令的更多信息,请访问此 article。
我自己解决了这个问题,这里是使用 JAVA 的解决方案
以下代码将从 Google 云存储中获取备份文件并将其加载到 Google Big Query。
AppIdentityCredential bqCredential = new AppIdentityCredential(
Collections.singleton(BigqueryScopes.BIGQUERY));
AppIdentityCredential dsCredential = new AppIdentityCredential(
Collections.singleton(StorageScopes.CLOUD_PLATFORM));
Storage storage = new Storage(HTTP_TRANSPORT, JSON_FACTORY, dsCredential);
Objects list = storage.objects().list(bucket).setPrefix(prefix).setFields("items/name").execute();
if (list == null) {
Log.severe(BackupDBController.class, "BackupToBigQueryController",
"List from Google Cloud Storage was null", null);
} else if (list.isEmpty()) {
Log.severe(BackupDBController.class, "BackupToBigQueryController",
"List from Google Cloud Storage was empty", null);
} else {
for (String kind : getEntityNamesForBackup()) {
Job job = new Job();
JobConfiguration config = new JobConfiguration();
JobConfigurationLoad loadConfig = new JobConfigurationLoad();
String url = "";
for (StorageObject obj : list.getItems()) {
String currentUrl = obj.getName();
if (currentUrl.contains(kind + ".backup_info")) {
url = currentUrl;
break;
}
}
if (StringUtils.isStringEmpty(url)) {
continue;
} else {
url = "gs://"+bucket+"/" + url;
}
List<String> gsUrls = new ArrayList<>();
gsUrls.add(url);
loadConfig.setSourceUris(gsUrls);
loadConfig.set("sourceFormat", "DATASTORE_BACKUP");
loadConfig.set("allowQuotedNewlines", true);
TableReference table = new TableReference();
table.setProjectId(projectId);
table.setDatasetId(datasetId);
table.setTableId(kind);
loadConfig.setDestinationTable(table);
config.setLoad(loadConfig);
job.setConfiguration(config);
Bigquery bigquery = new Bigquery.Builder(HTTP_TRANSPORT, JSON_FACTORY, bqCredential)
.setApplicationName("BigQuery-Service-Accounts/0.1").setHttpRequestInitializer(bqCredential)
.build();
Insert insert = bigquery.jobs().insert(projectId, job);
JobReference jr = insert.execute().getJobReference();
Log.info(BackupDBController.class, "BackupToBigQueryController",
"Moving data to BigQuery was successful", null);
}
}
如果谁有更好的方法,请告诉我
截至上周,有一种适当的方法可以自动执行此操作。最重要的部分是gcloud beta datastore export
.
我围绕它创建了一个简短的脚本:https://github.com/chees/datastore2bigquery
您可以根据自己的情况进行调整。
我们的要求是以编程方式备份 Google 数据存储并将这些备份加载到 Google 大查询以供进一步分析。我们使用以下方法成功实现了自动备份
Queue queue = QueueFactory.getQueue("datastoreBackupQueue");
/*
* Create a task which is equivalent to the backup URL mentioned in
* above cron.xml, using new queue which has Datastore admin enabled
*/
TaskOptions taskOptions = TaskOptions.Builder.withUrl("/_ah/datastore_admin/backup.create")
.method(TaskOptions.Method.GET).param("name", "").param("filesystem", "gs")
.param("gs_bucket_name",
"db-backup" + "/" + TimeUtils.parseDateToString(new Date(), "yyyy/MMM/dd"))
.param("queue", queue.getQueueName());
/*
* Get list of dynamic entity kind names from the datastore based on
* the kinds present in the datastore at the start of backup
*/
List<String> entityNames = getEntityNamesForBackup();
for (String entityName : entityNames) {
taskOptions.param("kind", entityName);
}
/* Add this task to above queue */
queue.add(taskOptions);
然后我能够手动将此备份导入 Google Bigquery,但是我们如何自动执行此过程?
我也查看了大部分文档,但没有任何帮助 https://cloud.google.com/bigquery/docs/loading-data-cloud-storage#loading_data_from_google_cloud_storage
关于您在问题中提到的 loading data from Google Cloud Storage article,描述了一些使用命令行 Node.JS 或 Python.
从 GCS 导入的编程示例您还可以通过 运行 脚本中的以下命令自动将位于云存储上的数据导入 BigQuery:
$ gcloud alpha bigquery import SOURCE DESTINATION_TABLE
有关此命令的更多信息,请访问此 article。
我自己解决了这个问题,这里是使用 JAVA 的解决方案 以下代码将从 Google 云存储中获取备份文件并将其加载到 Google Big Query。
AppIdentityCredential bqCredential = new AppIdentityCredential(
Collections.singleton(BigqueryScopes.BIGQUERY));
AppIdentityCredential dsCredential = new AppIdentityCredential(
Collections.singleton(StorageScopes.CLOUD_PLATFORM));
Storage storage = new Storage(HTTP_TRANSPORT, JSON_FACTORY, dsCredential);
Objects list = storage.objects().list(bucket).setPrefix(prefix).setFields("items/name").execute();
if (list == null) {
Log.severe(BackupDBController.class, "BackupToBigQueryController",
"List from Google Cloud Storage was null", null);
} else if (list.isEmpty()) {
Log.severe(BackupDBController.class, "BackupToBigQueryController",
"List from Google Cloud Storage was empty", null);
} else {
for (String kind : getEntityNamesForBackup()) {
Job job = new Job();
JobConfiguration config = new JobConfiguration();
JobConfigurationLoad loadConfig = new JobConfigurationLoad();
String url = "";
for (StorageObject obj : list.getItems()) {
String currentUrl = obj.getName();
if (currentUrl.contains(kind + ".backup_info")) {
url = currentUrl;
break;
}
}
if (StringUtils.isStringEmpty(url)) {
continue;
} else {
url = "gs://"+bucket+"/" + url;
}
List<String> gsUrls = new ArrayList<>();
gsUrls.add(url);
loadConfig.setSourceUris(gsUrls);
loadConfig.set("sourceFormat", "DATASTORE_BACKUP");
loadConfig.set("allowQuotedNewlines", true);
TableReference table = new TableReference();
table.setProjectId(projectId);
table.setDatasetId(datasetId);
table.setTableId(kind);
loadConfig.setDestinationTable(table);
config.setLoad(loadConfig);
job.setConfiguration(config);
Bigquery bigquery = new Bigquery.Builder(HTTP_TRANSPORT, JSON_FACTORY, bqCredential)
.setApplicationName("BigQuery-Service-Accounts/0.1").setHttpRequestInitializer(bqCredential)
.build();
Insert insert = bigquery.jobs().insert(projectId, job);
JobReference jr = insert.execute().getJobReference();
Log.info(BackupDBController.class, "BackupToBigQueryController",
"Moving data to BigQuery was successful", null);
}
}
如果谁有更好的方法,请告诉我
截至上周,有一种适当的方法可以自动执行此操作。最重要的部分是gcloud beta datastore export
.
我围绕它创建了一个简短的脚本:https://github.com/chees/datastore2bigquery
您可以根据自己的情况进行调整。