sqlContext.createDataFrame() 的参数应该是什么?
what should be the argument for sqlContext.createDataFrame()?
此代码正在从给定列表创建数据框:
sample_one = [(0, 'mouse'), (1, 'black')]
sample_two = [(0, 'cat'), (1, 'tabby'), (2, 'mouse')]
sample_three = [(0, 'bear'), (1, 'black'), (2, 'salmon')]
sample_data_df = sqlContext.createDataFrame([(sample_one,), (sample_two,),(sample_three,)], ['features'])
在createDataFrame()中,为什么在sample_one(sample_one,)之后多了一个逗号?
这个语法是创建一个元组。您可以尝试以下方法:
>>> sample_one = [(0, 'mouse'), (1, 'black')]
>>> type((sample_one))
<type 'list'>
>>> type((sample_one,))
<type 'tuple'>
此代码正在从给定列表创建数据框:
sample_one = [(0, 'mouse'), (1, 'black')]
sample_two = [(0, 'cat'), (1, 'tabby'), (2, 'mouse')]
sample_three = [(0, 'bear'), (1, 'black'), (2, 'salmon')]
sample_data_df = sqlContext.createDataFrame([(sample_one,), (sample_two,),(sample_three,)], ['features'])
在createDataFrame()中,为什么在sample_one(sample_one,)之后多了一个逗号?
这个语法是创建一个元组。您可以尝试以下方法:
>>> sample_one = [(0, 'mouse'), (1, 'black')]
>>> type((sample_one))
<type 'list'>
>>> type((sample_one,))
<type 'tuple'>