提取到 python/pandas 中的行

Question

我将数据合并到一个用分号分隔的 .csv 中的 1 个单元格中，试图将它们放入彼此下方插入的自己的单元格中。与 excel“将文本拆分为列”非常相似，但需要它们进入行并在彼此下方对齐。

当前数据名：

最终目标：

正在使用的数据：enter image description here

Answer 1

您可以使用 .str.split(';') 来扩展列。棘手的想法是处理不需要拆分的列。在我看来，更简单的方法是将它们实际转换为与其他列相同的 'string with comma' 样式，然后遍历所有列并展开所有内容。这是一段带有数据集缩减版本的代码。希望对你有帮助。

import pandas as pd
import numpy as np 

df = pd.DataFrame({'User ID':[1,2],'salary':['65000;120000;70000','65000;120000;70000'],'gender':['male;male;female', 'male;male;female']})

print(df)

输出：

   User ID              salary            gender
0        1  65000;120000;70000  male;male;female
1        2  65000;120000;70000  male;male;female

asd

cols_to_split = ['salary', 'gender']
cols_no_split = ['User ID']
expanded_rows = {}

# need the number of splits to fill 
n_splits = len(df[cols_to_split[0]].loc[0].split(";"))

# expand elements of columns with single values to fit the same format of the others 
for col in cols_no_split:
  df[col] = df[col].apply(lambda x: ''.join([str(x)+';' for i in range(n_splits)]))
  df[col] = df[col].apply(lambda x: x[:-1])

# iterate through every column to expand them 
for col in (cols_no_split + cols_to_split):

  expanded_rows[col] = df[col].str.split(';').apply(pd.Series,1).stack().values


df_final = pd.DataFrame(expanded_rows)

print(df_final)

输出：

  User ID  salary  gender
0       1   65000    male
1       1  120000    male
2       1   70000  female
3       2   65000    male
4       2  120000    male
5       2   70000  female

提取到 python/pandas 中的行

Extract to rows in python/pandas

extract

reshape

dataframe