python 像 SAS 一样转置数据帧

python dataframe transposing like SAS

我正在使用 python 3 和 pandas 并且需要转置数据帧,例如 sas 的 proc transpose。我正在使用下面的代码,它不起作用。希望这个小代码能让你理解我的目标。我标记了 'not working' 个代码....

byvars = ['Unique_Id','Month']
dfrm = test_data
idvars = 'Activity_Type'
prefix = 'test'

var_vars = [for i in list(dfrm) if list(dfrm) not in byvars,idvars] # ------ Not Working

dfrm_txp = dfrm[byvars].drop_duplicates()

for i in dfrm[idvars].drop_duplicates():
    dfrm_txp = pd.merge(dfrm_txp,dfrm[dfrm[idvars]==i].drop(idvars, axis = 1),
                        on = byvars,how='outer')

    dfrm_txp = dfrm_txp.rename(columns = {var_vars :prefix + var_vars +'_' + str(i)}) # ---- Not Working

SAS 的 proc transpose 是一个多方面的重塑工具,可以在各种 varby 分组中将数据集从长到宽,从宽到长。 Python 的 pandas 具有多种重塑方法的对应物,例如 stackmeltpivot 和简化的 transpose(交换行和列).

虽然我不是特别了解您的需求,但请考虑 pandas' pivot_table that can reshape long to wide on indexed columns. Below demonstrates with example data using the current top 5 Whosebug answerers in the sas and pandas 标签,特别是它们的前三个标签。因为 pivot_table 创建了分层列,所以 zip 的列表理解是 运行 合并两个级别:

数据

from io import StringIO
import pandas as pd

txt = """UniqueID   Month   ActivityType    Score   Posts
Joe May sas 3151    1980
Tom May sas 792 690
DomPazz May sas 597 417
Reeza   May sas 549 511
Longfish    May sas 478 255
AndyHayden  May pandas  8063    1281
jezrael May pandas  7976    4754
EdChum  May pandas  6579    2501
unutbu  May python  39827   6409
piRSquared  May pandas  5024    3004
Joe May sas-macro   343 184
Tom May sas-macro   96  83
DomPazz May sas-macro   46  26
Reeza   May sas-macro   54  39
Longfish    May sql 62  39
AndyHayden  May python  7991    1360
jezrael May python  7485    4185
EdChum  May python  6439    2363
unutbu  May numpy   6382    1035
piRSquared  May python  4625    2782
Joe May sql 279 189
Tom May sql 91  79
DomPazz May sql 33  30
Reeza   May sql 32  38
Longfish    May variables   19  8
AndyHayden  May dataframe   2264    191
jezrael May dataframe   2847    1601
EdChum  May dataframe   1748    529
unutbu  May pandas  6345    1276
piRSquared  May dataframe   1696    853"""

df = pd.read_table(StringIO(txt), sep="\s+")

整形

byvars = ['UniqueID', 'Month']
reshapedf = df.pivot_table(index=byvars, columns=['ActivityType'], aggfunc='max')

# RENAME COLUMNS WITH PREFIX AND VARIABLE/VALUE NAMES
reshapedf.columns = ['test_'+"_".join(i) for i in zip(reshapedf.columns.get_level_values(0),
                                                      reshapedf.columns.get_level_values(1))]

print(reshapedf)                           # PRINT TO SCREEN
reshapedf.to_csv('Reshape_Output.csv')     # OUTPUT TO CSV

输出 (截图分成两部分但只有10行+header)