如何对具有替代 ID 的数据进行某些更改?
how to do certain changes with data having alternative ids?
我正在努力旋转和重塑一些数据。我有如下所示的数据。
nickname:Nick 加文
nickname:Nick 职位:教师
nickname:Nick 职责:teaching_math
nickname:Bob 马库斯
nickname:Bob 工作:音乐家
nickname:Bob 职责:plays_piano
我想改成:
尼克老师 teaching_math
加文老师 teaching_math
鲍勃音乐家 plays_piano
马库斯音乐家 plays_piano
非常感谢任何帮助!
#get the names, remove the nickname appendage
df[0] = df[0].str.split(':').str[-1]
#create temp column to get nicknames into another column
df['temp'] = np.where(~df[1].str.contains('[:]'),df[0],np.nan)
#extract words after the ':'
df[1] = df[1].str.lstrip('job:').str.lstrip('duties:').str.strip()
#fillna to the side so each name has job and duties beneath
df = df.ffill(axis=1)
#group by col 0
#combine words
#stack
#split into separate columns
#and drop index 0
final = (df
.groupby(0)
.agg(lambda x: x.str.cat(sep=','))
.stack()
.str.split(',', expand = True)
.reset_index(drop=[0]))
决赛
0 1 2
0 Marcus Musician plays_piano
1 Bob Musician plays_piano
2 Gavin Teacher teaching_math
3 Nick Teacher teaching_math
试试下面的代码。
dicts = {}
for i in open('your_data.txt'):
split_i = i.split(' ')
if split_i[0].split(':')[1] not in dicts:
dicts[split_i[0].split(':')[1]] = [split_i[1].rstrip()]
else:
dicts[split_i[0].split(':')[1]].append(split_i[1].replace('job: ', '').replace('duties:', '').strip())
for k, v in dicts.iteritems():
print k, v
我正在努力旋转和重塑一些数据。我有如下所示的数据。
nickname:Nick 加文 nickname:Nick 职位:教师 nickname:Nick 职责:teaching_math nickname:Bob 马库斯 nickname:Bob 工作:音乐家 nickname:Bob 职责:plays_piano
我想改成:
尼克老师 teaching_math 加文老师 teaching_math 鲍勃音乐家 plays_piano 马库斯音乐家 plays_piano
非常感谢任何帮助!
#get the names, remove the nickname appendage
df[0] = df[0].str.split(':').str[-1]
#create temp column to get nicknames into another column
df['temp'] = np.where(~df[1].str.contains('[:]'),df[0],np.nan)
#extract words after the ':'
df[1] = df[1].str.lstrip('job:').str.lstrip('duties:').str.strip()
#fillna to the side so each name has job and duties beneath
df = df.ffill(axis=1)
#group by col 0
#combine words
#stack
#split into separate columns
#and drop index 0
final = (df
.groupby(0)
.agg(lambda x: x.str.cat(sep=','))
.stack()
.str.split(',', expand = True)
.reset_index(drop=[0]))
决赛
0 1 2
0 Marcus Musician plays_piano
1 Bob Musician plays_piano
2 Gavin Teacher teaching_math
3 Nick Teacher teaching_math
试试下面的代码。
dicts = {}
for i in open('your_data.txt'):
split_i = i.split(' ')
if split_i[0].split(':')[1] not in dicts:
dicts[split_i[0].split(':')[1]] = [split_i[1].rstrip()]
else:
dicts[split_i[0].split(':')[1]].append(split_i[1].replace('job: ', '').replace('duties:', '').strip())
for k, v in dicts.iteritems():
print k, v