python 像 SAS 一样转置数据帧
python dataframe transposing like SAS
我正在使用 python 3 和 pandas 并且需要转置数据帧,例如 sas 的 proc transpose
。我正在使用下面的代码,它不起作用。希望这个小代码能让你理解我的目标。我标记了 'not working' 个代码....
byvars = ['Unique_Id','Month']
dfrm = test_data
idvars = 'Activity_Type'
prefix = 'test'
var_vars = [for i in list(dfrm) if list(dfrm) not in byvars,idvars] # ------ Not Working
dfrm_txp = dfrm[byvars].drop_duplicates()
for i in dfrm[idvars].drop_duplicates():
dfrm_txp = pd.merge(dfrm_txp,dfrm[dfrm[idvars]==i].drop(idvars, axis = 1),
on = byvars,how='outer')
dfrm_txp = dfrm_txp.rename(columns = {var_vars :prefix + var_vars +'_' + str(i)}) # ---- Not Working
SAS 的 proc transpose
是一个多方面的重塑工具,可以在各种 var
和 by
分组中将数据集从长到宽,从宽到长。 Python 的 pandas 具有多种重塑方法的对应物,例如 stack
、melt
、pivot
和简化的 transpose
(交换行和列).
虽然我不是特别了解您的需求,但请考虑 pandas' pivot_table that can reshape long to wide on indexed columns. Below demonstrates with example data using the current top 5 Whosebug answerers in the sas and pandas 标签,特别是它们的前三个标签。因为 pivot_table
创建了分层列,所以 zip
的列表理解是 运行 合并两个级别:
数据
from io import StringIO
import pandas as pd
txt = """UniqueID Month ActivityType Score Posts
Joe May sas 3151 1980
Tom May sas 792 690
DomPazz May sas 597 417
Reeza May sas 549 511
Longfish May sas 478 255
AndyHayden May pandas 8063 1281
jezrael May pandas 7976 4754
EdChum May pandas 6579 2501
unutbu May python 39827 6409
piRSquared May pandas 5024 3004
Joe May sas-macro 343 184
Tom May sas-macro 96 83
DomPazz May sas-macro 46 26
Reeza May sas-macro 54 39
Longfish May sql 62 39
AndyHayden May python 7991 1360
jezrael May python 7485 4185
EdChum May python 6439 2363
unutbu May numpy 6382 1035
piRSquared May python 4625 2782
Joe May sql 279 189
Tom May sql 91 79
DomPazz May sql 33 30
Reeza May sql 32 38
Longfish May variables 19 8
AndyHayden May dataframe 2264 191
jezrael May dataframe 2847 1601
EdChum May dataframe 1748 529
unutbu May pandas 6345 1276
piRSquared May dataframe 1696 853"""
df = pd.read_table(StringIO(txt), sep="\s+")
整形
byvars = ['UniqueID', 'Month']
reshapedf = df.pivot_table(index=byvars, columns=['ActivityType'], aggfunc='max')
# RENAME COLUMNS WITH PREFIX AND VARIABLE/VALUE NAMES
reshapedf.columns = ['test_'+"_".join(i) for i in zip(reshapedf.columns.get_level_values(0),
reshapedf.columns.get_level_values(1))]
print(reshapedf) # PRINT TO SCREEN
reshapedf.to_csv('Reshape_Output.csv') # OUTPUT TO CSV
输出 (截图分成两部分但只有10行+header)
我正在使用 python 3 和 pandas 并且需要转置数据帧,例如 sas 的 proc transpose
。我正在使用下面的代码,它不起作用。希望这个小代码能让你理解我的目标。我标记了 'not working' 个代码....
byvars = ['Unique_Id','Month']
dfrm = test_data
idvars = 'Activity_Type'
prefix = 'test'
var_vars = [for i in list(dfrm) if list(dfrm) not in byvars,idvars] # ------ Not Working
dfrm_txp = dfrm[byvars].drop_duplicates()
for i in dfrm[idvars].drop_duplicates():
dfrm_txp = pd.merge(dfrm_txp,dfrm[dfrm[idvars]==i].drop(idvars, axis = 1),
on = byvars,how='outer')
dfrm_txp = dfrm_txp.rename(columns = {var_vars :prefix + var_vars +'_' + str(i)}) # ---- Not Working
SAS 的 proc transpose
是一个多方面的重塑工具,可以在各种 var
和 by
分组中将数据集从长到宽,从宽到长。 Python 的 pandas 具有多种重塑方法的对应物,例如 stack
、melt
、pivot
和简化的 transpose
(交换行和列).
虽然我不是特别了解您的需求,但请考虑 pandas' pivot_table that can reshape long to wide on indexed columns. Below demonstrates with example data using the current top 5 Whosebug answerers in the sas and pandas 标签,特别是它们的前三个标签。因为 pivot_table
创建了分层列,所以 zip
的列表理解是 运行 合并两个级别:
数据
from io import StringIO
import pandas as pd
txt = """UniqueID Month ActivityType Score Posts
Joe May sas 3151 1980
Tom May sas 792 690
DomPazz May sas 597 417
Reeza May sas 549 511
Longfish May sas 478 255
AndyHayden May pandas 8063 1281
jezrael May pandas 7976 4754
EdChum May pandas 6579 2501
unutbu May python 39827 6409
piRSquared May pandas 5024 3004
Joe May sas-macro 343 184
Tom May sas-macro 96 83
DomPazz May sas-macro 46 26
Reeza May sas-macro 54 39
Longfish May sql 62 39
AndyHayden May python 7991 1360
jezrael May python 7485 4185
EdChum May python 6439 2363
unutbu May numpy 6382 1035
piRSquared May python 4625 2782
Joe May sql 279 189
Tom May sql 91 79
DomPazz May sql 33 30
Reeza May sql 32 38
Longfish May variables 19 8
AndyHayden May dataframe 2264 191
jezrael May dataframe 2847 1601
EdChum May dataframe 1748 529
unutbu May pandas 6345 1276
piRSquared May dataframe 1696 853"""
df = pd.read_table(StringIO(txt), sep="\s+")
整形
byvars = ['UniqueID', 'Month']
reshapedf = df.pivot_table(index=byvars, columns=['ActivityType'], aggfunc='max')
# RENAME COLUMNS WITH PREFIX AND VARIABLE/VALUE NAMES
reshapedf.columns = ['test_'+"_".join(i) for i in zip(reshapedf.columns.get_level_values(0),
reshapedf.columns.get_level_values(1))]
print(reshapedf) # PRINT TO SCREEN
reshapedf.to_csv('Reshape_Output.csv') # OUTPUT TO CSV
输出 (截图分成两部分但只有10行+header)