如何在机器学习中打印分类特征?
How to print Categorical features in Machine Learning?
假设我有一个火车数据集
r1:便宜,贵 -> 价格
r2:兴奋->娱乐
r3:炎热,夏天 -> 天气
r4:金钱 -> 价格
r5:下雨 -> 天气
那我要这样显示:
价格 -> 便宜,贵,钱
娱乐 -> 兴奋
天气 -> 炎热,夏天,下雨
有人知道吗?我正在进行 NLP 研究。谢谢你。
import pandas as pd
# Dictionary of items
d = {'words' : [ [ 'cheap', 'expensive'], ['excited'], ['hot', 'summer'], ['money'], ['rain'] ],
'category': ['price', 'entertainment', 'weather', 'price', 'weather']}
# Convert dictionary to dataframe
df = pd.DataFrame(d)
# Unpack the list of 'words' by joining with ','
df.words = df.words.str.join(',')
# Groupby and aggregate to get the unique 'words' for each 'category'
new_df = df.groupby('category').agg({'words':'unique'})
# Since the groupby results in a list of items, unpack by joining with ','
new_df.words = new_df.words.str.join(',')
# reset_index() to convert the groupby object to a dataframe
# This is optional. If not used, 'category' will the index of the dataframe.
new_df.reset_index(inplace=True)
new_df
假设我有一个火车数据集
r1:便宜,贵 -> 价格
r2:兴奋->娱乐
r3:炎热,夏天 -> 天气
r4:金钱 -> 价格
r5:下雨 -> 天气
那我要这样显示:
价格 -> 便宜,贵,钱
娱乐 -> 兴奋
天气 -> 炎热,夏天,下雨
有人知道吗?我正在进行 NLP 研究。谢谢你。
import pandas as pd
# Dictionary of items
d = {'words' : [ [ 'cheap', 'expensive'], ['excited'], ['hot', 'summer'], ['money'], ['rain'] ],
'category': ['price', 'entertainment', 'weather', 'price', 'weather']}
# Convert dictionary to dataframe
df = pd.DataFrame(d)
# Unpack the list of 'words' by joining with ','
df.words = df.words.str.join(',')
# Groupby and aggregate to get the unique 'words' for each 'category'
new_df = df.groupby('category').agg({'words':'unique'})
# Since the groupby results in a list of items, unpack by joining with ','
new_df.words = new_df.words.str.join(',')
# reset_index() to convert the groupby object to a dataframe
# This is optional. If not used, 'category' will the index of the dataframe.
new_df.reset_index(inplace=True)
new_df