在pyspark中怎么算?
how count in pyspark?
我有一长串标题。我想计算整个数据集中的每个标题。例如:
`title`
A
b
A
c
c
c
输出:
title fre
A 2
b 1
c 3
嗨,你可以做到这一点
import pandas as pd
title=["A","b","A","c","c","c"]
pd.Series(title).value_counts()
您可以 groupBy
title
然后 count
:
import pyspark.sql.functions as f
df.groupBy('title').agg(f.count('*').alias('count')).show()
+-----+-----+
|title|count|
+-----+-----+
| A| 2|
| c| 3|
| b| 1|
+-----+-----+
或更简洁:
df.groupBy('title').count().show()
+-----+-----+
|title|count|
+-----+-----+
| A| 2|
| c| 3|
| b| 1|
+-----+-----+
我有一长串标题。我想计算整个数据集中的每个标题。例如:
`title`
A
b
A
c
c
c
输出:
title fre
A 2
b 1
c 3
嗨,你可以做到这一点
import pandas as pd
title=["A","b","A","c","c","c"]
pd.Series(title).value_counts()
您可以 groupBy
title
然后 count
:
import pyspark.sql.functions as f
df.groupBy('title').agg(f.count('*').alias('count')).show()
+-----+-----+
|title|count|
+-----+-----+
| A| 2|
| c| 3|
| b| 1|
+-----+-----+
或更简洁:
df.groupBy('title').count().show()
+-----+-----+
|title|count|
+-----+-----+
| A| 2|
| c| 3|
| b| 1|
+-----+-----+