Dask,从几个 dask 数组创建一个数据框
Dask, create a dataframe from several dask arrays
假设我有一组 dask 数组,例如:
c1 = da.from_array(np.arange(100000, 190000), chunks=1000)
c2 = da.from_array(np.arange(200000, 290000), chunks=1000)
c3 = da.from_array(np.arange(300000, 390000), chunks=1000)
是否可以从它们创建一个 dask 数据框?在 pandas 我可以说:
data = {}
data['c1'] = c1
data['c2'] = c2
data['c3'] = c3
df = pd.DataFrame(data)
有没有类似的方法可以用 dask 做到这一点?
以下应该有效:
import pandas as pd, numpy as np
import dask.array as da, dask.dataframe as dd
c1 = da.from_array(np.arange(100000, 190000), chunks=1000)
c2 = da.from_array(np.arange(200000, 290000), chunks=1000)
c3 = da.from_array(np.arange(300000, 390000), chunks=1000)
# generate dask dataframe
ddf = dd.concat([dd.from_dask_array(c) for c in [c1,c2,c3]], axis = 1)
# name columns
ddf.columns = ['c1', 'c2', 'c3']
假设我有一组 dask 数组,例如:
c1 = da.from_array(np.arange(100000, 190000), chunks=1000)
c2 = da.from_array(np.arange(200000, 290000), chunks=1000)
c3 = da.from_array(np.arange(300000, 390000), chunks=1000)
是否可以从它们创建一个 dask 数据框?在 pandas 我可以说:
data = {}
data['c1'] = c1
data['c2'] = c2
data['c3'] = c3
df = pd.DataFrame(data)
有没有类似的方法可以用 dask 做到这一点?
以下应该有效:
import pandas as pd, numpy as np
import dask.array as da, dask.dataframe as dd
c1 = da.from_array(np.arange(100000, 190000), chunks=1000)
c2 = da.from_array(np.arange(200000, 290000), chunks=1000)
c3 = da.from_array(np.arange(300000, 390000), chunks=1000)
# generate dask dataframe
ddf = dd.concat([dd.from_dask_array(c) for c in [c1,c2,c3]], axis = 1)
# name columns
ddf.columns = ['c1', 'c2', 'c3']