sqlContext.createDataFrame() 的参数应该是什么?

what should be the argument for sqlContext.createDataFrame()?

此代码正在从给定列表创建数据框:

sample_one = [(0, 'mouse'), (1, 'black')]
sample_two = [(0, 'cat'), (1, 'tabby'), (2, 'mouse')]
sample_three =  [(0, 'bear'), (1, 'black'), (2, 'salmon')]
sample_data_df = sqlContext.createDataFrame([(sample_one,), (sample_two,),(sample_three,)], ['features'])

在createDataFrame()中,为什么在sample_one(sample_one,)之后多了一个逗号?

这个语法是创建一个元组。您可以尝试以下方法:

>>> sample_one = [(0, 'mouse'), (1, 'black')]
>>> type((sample_one))
<type 'list'>
>>> type((sample_one,))
<type 'tuple'>