模块 google.cloud 没有属性存储
module google.cloud has no attribute storage
我正在按照本教程在 GCP python 中尝试 运行 一个 beam 脚本:
[https://levelup.gitconnected.com/scaling-scikit-learn-with-apache-beam-251eb6fcf75b][1]
但我不断收到以下错误:
AttributeError: module 'google.cloud' has no attribute 'storage'
我的 requirements.txt 中有 google-cloud-storage,所以真的不确定我在这里遗漏了什么。
我的完整脚本:
import apache_beam as beam
import json
query = """
SELECT
year,
plurality,
apgar_5min,
mother_age,
father_age,
gestation_weeks,
ever_born,
case when mother_married = true then 1 else 0 end as mother_married,
weight_pounds as weight,
current_timestamp as time,
GENERATE_UUID() as guid
FROM `bigquery-public-data.samples.natality`
order by rand()
limit 100
"""
class ApplyDoFn(beam.DoFn):
def __init__(self):
self._model = None
from google.cloud import storage
import pandas as pd
import pickle as pkl
self._storage = storage
self._pkl = pkl
self._pd = pd
def process(self, element):
if self._model is None:
bucket = self._storage.Client().get_bucket('bqr_dump')
blob = bucket.get_blob('natality/sklearn-linear')
self._model = self._pkl.loads(blob.download_as_string())
new_x = self._pd.DataFrame.from_dict(element,
orient='index').transpose().fillna(0)
pred_weight = self._model.predict(new_x.iloc[:, 1:8])[0]
return [ {'guid': element['guid'],
'predicted_weight': pred_weight,
'time': str(element['time'])}]
# set up pipeline options
options = {'project': my-project-name,
'runner': 'DataflowRunner',
'temp_location': 'gs://bqr_dump/tmp',
'staging_location': 'gs://bqr_dump/tmp'
}
pipeline_options = beam.pipeline.PipelineOptions(flags=[], **options)
with beam.Pipeline(options=pipeline_options) as pipeline:
(
pipeline
| 'ReadTable' >> beam.io.Read(beam.io.BigQuerySource(
query=query,
use_standard_sql=True))
| 'Apply Model' >> beam.ParDo(ApplyDoFn())
| 'Save to BigQuery' >> beam.io.WriteToBigQuery(
'pzn-pi-sto:beam_test.weight_preds',
schema='guid:STRING,weight:FLOAT64,time:STRING',
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED))
和我的 requirements.txt:
google-cloud==0.34.0
google-cloud-storage==1.30.0
apache-beam[GCP]==2.20.0
这个问题通常与两个主要原因有关:模块安装不当,这意味着安装过程中出现问题;第二个原因,模块 import
未正确安装。
要解决此问题,如果原因是模块损坏,则可以在虚拟环境中重新安装或检查它。如前所述 ,与您的情况类似,这应该可以解决您的问题。
对于第二个原因,尝试更改代码并在代码开头导入所有模块,如本官方示例所示here。您的代码应该是这样的:
import apache_beam as beam
import json
import pandas as pd
import pickle as pkl
from google.cloud import storage
...
如果这些信息对您有帮助,请告诉我!
确保您安装了正确的版本。因为 Google 维护的模块会不断更新。如果你只是为所需的包提供 pip install,它将直接安装最新版本的包。
我正在按照本教程在 GCP python 中尝试 运行 一个 beam 脚本:
[https://levelup.gitconnected.com/scaling-scikit-learn-with-apache-beam-251eb6fcf75b][1]
但我不断收到以下错误:
AttributeError: module 'google.cloud' has no attribute 'storage'
我的 requirements.txt 中有 google-cloud-storage,所以真的不确定我在这里遗漏了什么。
我的完整脚本:
import apache_beam as beam
import json
query = """
SELECT
year,
plurality,
apgar_5min,
mother_age,
father_age,
gestation_weeks,
ever_born,
case when mother_married = true then 1 else 0 end as mother_married,
weight_pounds as weight,
current_timestamp as time,
GENERATE_UUID() as guid
FROM `bigquery-public-data.samples.natality`
order by rand()
limit 100
"""
class ApplyDoFn(beam.DoFn):
def __init__(self):
self._model = None
from google.cloud import storage
import pandas as pd
import pickle as pkl
self._storage = storage
self._pkl = pkl
self._pd = pd
def process(self, element):
if self._model is None:
bucket = self._storage.Client().get_bucket('bqr_dump')
blob = bucket.get_blob('natality/sklearn-linear')
self._model = self._pkl.loads(blob.download_as_string())
new_x = self._pd.DataFrame.from_dict(element,
orient='index').transpose().fillna(0)
pred_weight = self._model.predict(new_x.iloc[:, 1:8])[0]
return [ {'guid': element['guid'],
'predicted_weight': pred_weight,
'time': str(element['time'])}]
# set up pipeline options
options = {'project': my-project-name,
'runner': 'DataflowRunner',
'temp_location': 'gs://bqr_dump/tmp',
'staging_location': 'gs://bqr_dump/tmp'
}
pipeline_options = beam.pipeline.PipelineOptions(flags=[], **options)
with beam.Pipeline(options=pipeline_options) as pipeline:
(
pipeline
| 'ReadTable' >> beam.io.Read(beam.io.BigQuerySource(
query=query,
use_standard_sql=True))
| 'Apply Model' >> beam.ParDo(ApplyDoFn())
| 'Save to BigQuery' >> beam.io.WriteToBigQuery(
'pzn-pi-sto:beam_test.weight_preds',
schema='guid:STRING,weight:FLOAT64,time:STRING',
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED))
和我的 requirements.txt:
google-cloud==0.34.0
google-cloud-storage==1.30.0
apache-beam[GCP]==2.20.0
这个问题通常与两个主要原因有关:模块安装不当,这意味着安装过程中出现问题;第二个原因,模块 import
未正确安装。
要解决此问题,如果原因是模块损坏,则可以在虚拟环境中重新安装或检查它。如前所述
对于第二个原因,尝试更改代码并在代码开头导入所有模块,如本官方示例所示here。您的代码应该是这样的:
import apache_beam as beam
import json
import pandas as pd
import pickle as pkl
from google.cloud import storage
...
如果这些信息对您有帮助,请告诉我!
确保您安装了正确的版本。因为 Google 维护的模块会不断更新。如果你只是为所需的包提供 pip install,它将直接安装最新版本的包。