无法从 BigQuery 读取
Cannot read from BigQuery
我尝试阅读一个简单的 BigQuery table。
这挂在:
WARNING:root:Dataset thijs-dev:temp_dataset_b234824381e04e1324234237724b485f95c does not exist so we will create it as temporary with location=EU
为此,我使用以下脚本:
python main.py \
--runner DirectRunner \
--project thijs-dev \
--temp_location gs://thijs/tmp/ \
--job_name thijs-dev-load \
--save_main_session
以及完整的 Python 脚本:
import apache_beam as beam
import logging
import argparse
def run(argv=None):
parser = argparse.ArgumentParser()
known_args, pipeline_args = parser.parse_known_args(argv)
with beam.Pipeline(argv=pipeline_args) as p:
""" Read all data from source_table """
source_data = (p | beam.io.Read(beam.io.BigQuerySource(query="select * from `thijs-dev.metathijs.thijs_locations`", use_standard_sql=True)))
if __name__ == '__main__':
print("Start")
logging.getLogger().setLevel(logging.INFO)
run()
原来数据流非常慢。处理26MB的数据需要半小时,但它仍然可以工作。
我尝试阅读一个简单的 BigQuery table。
这挂在:
WARNING:root:Dataset thijs-dev:temp_dataset_b234824381e04e1324234237724b485f95c does not exist so we will create it as temporary with location=EU
为此,我使用以下脚本:
python main.py \
--runner DirectRunner \
--project thijs-dev \
--temp_location gs://thijs/tmp/ \
--job_name thijs-dev-load \
--save_main_session
以及完整的 Python 脚本:
import apache_beam as beam
import logging
import argparse
def run(argv=None):
parser = argparse.ArgumentParser()
known_args, pipeline_args = parser.parse_known_args(argv)
with beam.Pipeline(argv=pipeline_args) as p:
""" Read all data from source_table """
source_data = (p | beam.io.Read(beam.io.BigQuerySource(query="select * from `thijs-dev.metathijs.thijs_locations`", use_standard_sql=True)))
if __name__ == '__main__':
print("Start")
logging.getLogger().setLevel(logging.INFO)
run()
原来数据流非常慢。处理26MB的数据需要半小时,但它仍然可以工作。