如何将数据框中的列拆分为元组列表
how to split a column in dataframe into list of tuple
我在网上找到了一些答案,但我对正则表达式没有经验,我认为这是这里需要的,如果有其他方法会更好。
我的数据框中有一个复杂的列,需要用 ',' ';' 分隔'(' ')' ':'
示例字符串:
(36%) (litopenaaus varmrn ), une chapelure (25%) [vmaaî fmur, water,) sel, soja 0i), sucre, levure), eau. î farine de whca, amidon de mais, sart, cre. regulators (450, 500, stg). soybean [containing an antioxidant (300)]. sucre, powder of gariic, levure, th ci nœ (412). contient des crevettes"
应拆分为包含以下内容的列表
["36%", "litopenaaus varmrn", "une chapelure (25%)", ["vmaaî fmur", "water", "sel", "soja 0i", "sucre", "levure"], "eau. î farine de whca", "amidon de mais", "sart", "cre. regulators ["(450, 500, stg)"]. soybean [containing an antioxidant (300)]. sucre", "powder of gariic", "levure"," th ci nœ (412). contient des crevettes"]
我为此编写的代码看起来像这样,但什么也没发生:
delimiters = ",", ":", "(", ")", ";"
regexPattern = '|'.join(map(re.escape, delimiters))
df['splited'] = df.ingredient.apply(lambda row: ' '.join((re.split(regexPattern, str(row)))))
通过
delimiters = ",", ":", "(", ")", ";"
regexPattern = '|'.join(map(re.escape, delimiters))
df['splited'] = df.ingredient.apply(lambda row: ' '.join((re.split(regexPattern, str(row)))))
你实际上拆分了 (re.split
),然后使用 space 字符 (' '.join
) 连接了创建的部件,如果你需要部件列表而不是单个新字符串,请不要连接它们, 即
df['splited'] = df.ingredient.apply(lambda row: re.split(regexPattern, str(row)))
我在网上找到了一些答案,但我对正则表达式没有经验,我认为这是这里需要的,如果有其他方法会更好。
我的数据框中有一个复杂的列,需要用 ',' ';' 分隔'(' ')' ':'
示例字符串:
(36%) (litopenaaus varmrn ), une chapelure (25%) [vmaaî fmur, water,) sel, soja 0i), sucre, levure), eau. î farine de whca, amidon de mais, sart, cre. regulators (450, 500, stg). soybean [containing an antioxidant (300)]. sucre, powder of gariic, levure, th ci nœ (412). contient des crevettes"
应拆分为包含以下内容的列表
["36%", "litopenaaus varmrn", "une chapelure (25%)", ["vmaaî fmur", "water", "sel", "soja 0i", "sucre", "levure"], "eau. î farine de whca", "amidon de mais", "sart", "cre. regulators ["(450, 500, stg)"]. soybean [containing an antioxidant (300)]. sucre", "powder of gariic", "levure"," th ci nœ (412). contient des crevettes"]
我为此编写的代码看起来像这样,但什么也没发生:
delimiters = ",", ":", "(", ")", ";"
regexPattern = '|'.join(map(re.escape, delimiters))
df['splited'] = df.ingredient.apply(lambda row: ' '.join((re.split(regexPattern, str(row)))))
通过
delimiters = ",", ":", "(", ")", ";"
regexPattern = '|'.join(map(re.escape, delimiters))
df['splited'] = df.ingredient.apply(lambda row: ' '.join((re.split(regexPattern, str(row)))))
你实际上拆分了 (re.split
),然后使用 space 字符 (' '.join
) 连接了创建的部件,如果你需要部件列表而不是单个新字符串,请不要连接它们, 即
df['splited'] = df.ingredient.apply(lambda row: re.split(regexPattern, str(row)))