python 基于另一个字符串变量创建字符串变量(包含)
python create string variable based on a another string variable (contains)
我有一个包含阶段信息的混乱字符串变量,我想创建一个包含更少组的更清晰的字符串。当前数据框如下所示:
cohort = pd.DataFrame({'stage_group': ['XXX Stage I', 'Stage II XXX', 'Stage III XXX', 'XX Stage IV XXX', 'NA']},index=[1, 2, 3, 4, 5])
我的理想变量是 3 个级别:I-III 阶段、IV 阶段和未知:
cohort2 = pd.DataFrame({'stage_group': ['XXX Stage I', 'Stage II XXX', 'Stage III XXX', 'XX Stage IV XXX','NA'],'stage': ['Stage I', 'Stage II', 'Stage III', 'Stage IV', 'Unknown']},index=[1, 2, 3, 4, 5])
我尝试了以下代码,但它们没有正确分配组(我刚进入 I-III 阶段,未知)。任何的意见都将会有帮助。
searchfor = ['Stage I', 'Stage II', 'Stage III']
cohort['stage'] = pd.np.where(cohort.stage_group.str.contains('|'.join(searchfor)), "Stage I-III",
pd.np.where(cohort.stage_group.str.contains('Stage IV'), "Stage IV", "Unkown"))
如果我更改顺序,代码对我有用,因为 Stage IV
还包含 Stage I
,因此必须在 Stage I
之前检查 Stage IV
import pandas as pd
data = {'stage_group': '''XXX Stage I
Stage II XXX
Stage III XXX
XX Stage IV XXX
NA'''.split('\n')
}
cohort = pd.DataFrame(data)
print(cohort)
searchfor = ['Stage I', 'Stage II', 'Stage III']
cohort['stage'] = pd.np.where(cohort.stage_group.str.contains('Stage IV'), "Stage IV",
pd.np.where( cohort.stage_group.str.contains('|'.join(searchfor)), "Stage I-III", "Unkown"))
print(cohort)
结果
stage_group
0 XXX Stage I
1 Stage II XXX
2 Stage III XXX
3 XX Stage IV XXX
4 NA
stage_group stage
0 XXX Stage I Stage I-III
1 Stage II XXX Stage I-III
2 Stage III XXX Stage I-III
3 XX Stage IV XXX Stage IV
4 NA Unkown
我有一个包含阶段信息的混乱字符串变量,我想创建一个包含更少组的更清晰的字符串。当前数据框如下所示:
cohort = pd.DataFrame({'stage_group': ['XXX Stage I', 'Stage II XXX', 'Stage III XXX', 'XX Stage IV XXX', 'NA']},index=[1, 2, 3, 4, 5])
我的理想变量是 3 个级别:I-III 阶段、IV 阶段和未知:
cohort2 = pd.DataFrame({'stage_group': ['XXX Stage I', 'Stage II XXX', 'Stage III XXX', 'XX Stage IV XXX','NA'],'stage': ['Stage I', 'Stage II', 'Stage III', 'Stage IV', 'Unknown']},index=[1, 2, 3, 4, 5])
我尝试了以下代码,但它们没有正确分配组(我刚进入 I-III 阶段,未知)。任何的意见都将会有帮助。
searchfor = ['Stage I', 'Stage II', 'Stage III']
cohort['stage'] = pd.np.where(cohort.stage_group.str.contains('|'.join(searchfor)), "Stage I-III",
pd.np.where(cohort.stage_group.str.contains('Stage IV'), "Stage IV", "Unkown"))
如果我更改顺序,代码对我有用,因为 Stage IV
还包含 Stage I
,因此必须在 Stage I
Stage IV
import pandas as pd
data = {'stage_group': '''XXX Stage I
Stage II XXX
Stage III XXX
XX Stage IV XXX
NA'''.split('\n')
}
cohort = pd.DataFrame(data)
print(cohort)
searchfor = ['Stage I', 'Stage II', 'Stage III']
cohort['stage'] = pd.np.where(cohort.stage_group.str.contains('Stage IV'), "Stage IV",
pd.np.where( cohort.stage_group.str.contains('|'.join(searchfor)), "Stage I-III", "Unkown"))
print(cohort)
结果
stage_group
0 XXX Stage I
1 Stage II XXX
2 Stage III XXX
3 XX Stage IV XXX
4 NA
stage_group stage
0 XXX Stage I Stage I-III
1 Stage II XXX Stage I-III
2 Stage III XXX Stage I-III
3 XX Stage IV XXX Stage IV
4 NA Unkown