设置 Pandas DataFrame 列的字符限制
Setting Character Limit on Pandas DataFrame Column
背景:
鉴于以下 pandas df
-
Holding Account
Model Type
Entity ID
Direct Owner ID
WF LLC | 100 Jones Street 26th Floor San Francisco Ca Ltd Liability - Income Based Gross USA Only (486941515)
51364633
4564564
5646546
RF LLC | Neuberger | LLC | Aukai Services LLC-Neuberger Smid - Income Accuring Net of Fees Worldwide Fund (456456218)
46256325
1645365
4926654
提问:
对 Holding Account
列 (dtype = object
) 值强制执行 80 个字符限制的最 pythonic 方法是什么?
上下文:我正在将 df
写入 .csv
,然后上传到具有 80 个字符限制的系统。 Holding Account
列的值是唯一的,所以我只想牺牲那些使字符串超过 80 个字符的字符。
我的尝试:
这就是我尝试的 - df['column'] = df['column'].str[:80]
为什么不像以前那样使用 .str
?
df['Holding Account'] = df['Holding Account'].str[:80]
输出:
>>> df
Holding Account Model Type Entity ID Direct Owner ID
0 WF LLC | 100 Jones Street 26th Floor San Francisco Ca Ltd Liability - Income Bas 51364633 4564564 5646546
1 RF LLC | Neuberger | LLC | Aukai Services LLC-Neuberger Smid - Income Accuring N 46256325 1645365 4926654
使用切片会丢失一些信息,我建议在分解后创建一个映射table。这也为服务器或 db
节省了存储 space
s = df['Holding Account'].factorize()[0]
df['Holding Account'] = df['Holding Account'].factorize()[0]
d = dict(zip(s, df['Holding Account']))
如果你想获取数据库就这样做
df['new'] = df['Holding Account'] .map(d)
背景:
鉴于以下 pandas df
-
Holding Account | Model Type | Entity ID | Direct Owner ID |
---|---|---|---|
WF LLC | 100 Jones Street 26th Floor San Francisco Ca Ltd Liability - Income Based Gross USA Only (486941515) | 51364633 | 4564564 | 5646546 |
RF LLC | Neuberger | LLC | Aukai Services LLC-Neuberger Smid - Income Accuring Net of Fees Worldwide Fund (456456218) | 46256325 | 1645365 | 4926654 |
提问:
对 Holding Account
列 (dtype = object
) 值强制执行 80 个字符限制的最 pythonic 方法是什么?
上下文:我正在将 df
写入 .csv
,然后上传到具有 80 个字符限制的系统。 Holding Account
列的值是唯一的,所以我只想牺牲那些使字符串超过 80 个字符的字符。
我的尝试:
这就是我尝试的 - df['column'] = df['column'].str[:80]
为什么不像以前那样使用 .str
?
df['Holding Account'] = df['Holding Account'].str[:80]
输出:
>>> df
Holding Account Model Type Entity ID Direct Owner ID
0 WF LLC | 100 Jones Street 26th Floor San Francisco Ca Ltd Liability - Income Bas 51364633 4564564 5646546
1 RF LLC | Neuberger | LLC | Aukai Services LLC-Neuberger Smid - Income Accuring N 46256325 1645365 4926654
使用切片会丢失一些信息,我建议在分解后创建一个映射table。这也为服务器或 db
节省了存储 spaces = df['Holding Account'].factorize()[0]
df['Holding Account'] = df['Holding Account'].factorize()[0]
d = dict(zip(s, df['Holding Account']))
如果你想获取数据库就这样做
df['new'] = df['Holding Account'] .map(d)