从 pandas python 中的列表 df 中创建一个 df
make a df out of a df of lists in pandas python
下面是我在 pandas ipython 中的 df。我想计算每个列表中的对象并将结果计数放入 df.['sponsor_id', 'list_count_int']
sponsor_id
7 [s2474-112, s1543-112, s1262-112, s3676-112, s...
11 [s130-110, s169-110, s589-110, s134-110, s3062...
66 [s918-112, s946-112, s3326-112, s2007-112, s33...
116 [s79-112, s1302-112, s3304-112, s175-112, s76-...
136 [s1619-112, s2475-112, s2507-112, s328-112, s2...
.
.
.
下面是我的代码。我正在尝试使用 for 循环。
import pandas as pd
df = pd.concat((pd.read_csv(f, names=['date','bill_id','sponsor_id']) for f in glob.glob('/home/jayaramdas/anaconda3/df/s11?_s_b')))
df.groupby('sponsor_id').apply(lambda x: list(x['bill_id']))
#this is the code for my for loop
df_new = df['sponsor_id'].astype('list').map(lambda x: sum(y for y in ['sponsor_id']))
我收到一条很长的错误消息。这是它的结尾:
/home/jayaramdas/anaconda3/lib/python3.5/site-packages/pandas/core/internals.py in _astype(self, dtype, copy, raise_on_error, values, klass, mgr, **kwargs)
443
444 # astype processing
--> 445 dtype = np.dtype(dtype)
446 if self.dtype == dtype:
447 if copy:
TypeError: data type "list" not understood
我认为您在 sponsor_id
列中有 int
个值。因此,您可以 apply
len
仅针对 list
类型的值。其他值 (int
) 设置为 1
:
print df
sponsor_id
0 [s2474-112, s1543-112, s1262-112, s3676-112]
1 [s130-110, s169-110]
2 102
df['count'] = df['sponsor_id'].apply(lambda x: len(x) if isinstance(x, list) else 1)
print df
sponsor_id count
0 [s2474-112, s1543-112, s1262-112, s3676-112] 4
1 [s130-110, s169-110] 2
2 102 1
下面是我在 pandas ipython 中的 df。我想计算每个列表中的对象并将结果计数放入 df.['sponsor_id', 'list_count_int']
sponsor_id
7 [s2474-112, s1543-112, s1262-112, s3676-112, s...
11 [s130-110, s169-110, s589-110, s134-110, s3062...
66 [s918-112, s946-112, s3326-112, s2007-112, s33...
116 [s79-112, s1302-112, s3304-112, s175-112, s76-...
136 [s1619-112, s2475-112, s2507-112, s328-112, s2...
.
.
.
下面是我的代码。我正在尝试使用 for 循环。
import pandas as pd
df = pd.concat((pd.read_csv(f, names=['date','bill_id','sponsor_id']) for f in glob.glob('/home/jayaramdas/anaconda3/df/s11?_s_b')))
df.groupby('sponsor_id').apply(lambda x: list(x['bill_id']))
#this is the code for my for loop
df_new = df['sponsor_id'].astype('list').map(lambda x: sum(y for y in ['sponsor_id']))
我收到一条很长的错误消息。这是它的结尾:
/home/jayaramdas/anaconda3/lib/python3.5/site-packages/pandas/core/internals.py in _astype(self, dtype, copy, raise_on_error, values, klass, mgr, **kwargs)
443
444 # astype processing
--> 445 dtype = np.dtype(dtype)
446 if self.dtype == dtype:
447 if copy:
TypeError: data type "list" not understood
我认为您在 sponsor_id
列中有 int
个值。因此,您可以 apply
len
仅针对 list
类型的值。其他值 (int
) 设置为 1
:
print df
sponsor_id
0 [s2474-112, s1543-112, s1262-112, s3676-112]
1 [s130-110, s169-110]
2 102
df['count'] = df['sponsor_id'].apply(lambda x: len(x) if isinstance(x, list) else 1)
print df
sponsor_id count
0 [s2474-112, s1543-112, s1262-112, s3676-112] 4
1 [s130-110, s169-110] 2
2 102 1