Pandas:关于如何在 pandas 数据框中加载数据的任何想法
Pandas: Any ideas on how to load data in pandas dataframe
大家好,我是 pandas、
的新手
我有多个类似这样的 CSV 文件:
john_age.csv
john_gender.csv
john_weight.csv
mike_age.csv
mike_gender.csv
mike_weight.csv
smith_age.csv
smith_gender.csv
smith_weight.csv
...
...
每个 csv 文件都有一个简单的单个字符串或数字,如下所示:
john_age.csv 54
john_gender.csv male
john_weight.csv 65.4
基本上,我想让整个数据框看起来像这样:
age gender weight
john 54 male 65.4
mike 23 male 86.5
smith 52 female 54
我怎样才能做到这一点?
我认为关键的想法是将每个 csv 文件名合并到数据框中,但到目前为止我只能使用 glob.glob 和附加函数读取多个 csv 文件,但附加函数不是解决方案:
csv_path = \mypath\
filenames = glob.glob(csv_path + '\*.csv')
dfs= []
for file in filenames:
dfs.append(pd.read_csv(file))
非常感谢!!
这就是我所说的:
with open('combined.csv','w') as combine:
for fn in glob.glob(csv_path+'\*_age.csv'):
name = os.path.basename(f).split('_')[0]
fields = [name]
for part in ('age','gender','weight'):
fields.append( open(f"{cvs_path}\{name}_{part}.csv").read().strip() )
print( ','.join(fields), file=combine )
dfs = pd.read_cvs('combined.csv')
这将从文件创建数据框。
import glob
import pandas as pd
csv_path = 'csvs'
filenames = glob.glob(csv_path + r'\*_age.csv')
people = []
attrs =['age', 'gender', 'weight']
for file in filenames:
person = {}
name = file[5:].split('_')[0]
print(name)
for attr in attrs:
person['name'] = name
with open(f'{csv_path}\{name}_{attr}.csv', 'r') as data_file:
data = data_file.readline()
person[attr] = data
people.append(person)
df = pd.DataFrame(people)
print(df)
您可以在一行中使用 pd.concat()
from glob import glob
import pandas as pd
files = glob(“path/to/files/*.csv”)
files.sort()
data = pd.concat((pd.read_csv(file) for file in files), ignore_index=True, header=0, names=[“age”, “gender”, “weight”])
大家好,我是 pandas、
的新手我有多个类似这样的 CSV 文件:
john_age.csv
john_gender.csv
john_weight.csv
mike_age.csv
mike_gender.csv
mike_weight.csv
smith_age.csv
smith_gender.csv
smith_weight.csv
...
...
每个 csv 文件都有一个简单的单个字符串或数字,如下所示:
john_age.csv 54
john_gender.csv male
john_weight.csv 65.4
基本上,我想让整个数据框看起来像这样:
age gender weight
john 54 male 65.4
mike 23 male 86.5
smith 52 female 54
我怎样才能做到这一点?
我认为关键的想法是将每个 csv 文件名合并到数据框中,但到目前为止我只能使用 glob.glob 和附加函数读取多个 csv 文件,但附加函数不是解决方案:
csv_path = \mypath\
filenames = glob.glob(csv_path + '\*.csv')
dfs= []
for file in filenames:
dfs.append(pd.read_csv(file))
非常感谢!!
这就是我所说的:
with open('combined.csv','w') as combine:
for fn in glob.glob(csv_path+'\*_age.csv'):
name = os.path.basename(f).split('_')[0]
fields = [name]
for part in ('age','gender','weight'):
fields.append( open(f"{cvs_path}\{name}_{part}.csv").read().strip() )
print( ','.join(fields), file=combine )
dfs = pd.read_cvs('combined.csv')
这将从文件创建数据框。
import glob
import pandas as pd
csv_path = 'csvs'
filenames = glob.glob(csv_path + r'\*_age.csv')
people = []
attrs =['age', 'gender', 'weight']
for file in filenames:
person = {}
name = file[5:].split('_')[0]
print(name)
for attr in attrs:
person['name'] = name
with open(f'{csv_path}\{name}_{attr}.csv', 'r') as data_file:
data = data_file.readline()
person[attr] = data
people.append(person)
df = pd.DataFrame(people)
print(df)
您可以在一行中使用 pd.concat()
from glob import glob
import pandas as pd
files = glob(“path/to/files/*.csv”)
files.sort()
data = pd.concat((pd.read_csv(file) for file in files), ignore_index=True, header=0, names=[“age”, “gender”, “weight”])