我可以在多个 csv 数据帧之间循环相同的分析，然后将每个数据帧的结果连接成一个 table 吗？

Question

新手python学习者在这里！我有 20 个参与者 csv 文件（P01.csv 到 P20.csv），其中包含包含 stroop 测试数据的数据帧。每个的重要列是条件列，其中包含不一致和一致条件的运行dom 组合，每个条件的反应时间列以及响应是否正确、真或假的列。这是 P01 的数据框示例我不确定这是否算作代码片段？ :

trialnum,colourtext,colourname,condition,response,rt,correct
1,blue,red,incongruent,red,0.767041,True
2,yellow,yellow,congruent,yellow,0.647259,True
3,green,blue,incongruent,blue,0.990185,True
4,green,green,congruent,green,0.720116,True
5,yellow,yellow,congruent,yellow,0.562909,True
6,yellow,yellow,congruent,yellow,0.538918,True
7,green,yellow,incongruent,yellow,0.693017,True
8,yellow,red,incongruent,red,0.679368,True
9,yellow,blue,incongruent,blue,0.951432,True
10,blue,blue,congruent,blue,0.633367,True
11,blue,green,incongruent,green,1.289047,True
12,green,green,congruent,green,0.668142,True
13,blue,red,incongruent,red,0.647722,True
14,red,blue,incongruent,blue,0.858307,True
15,red,red,congruent,red,1.820112,True
16,blue,green,incongruent,green,1.118404,True
17,red,red,congruent,red,0.798532,True
18,red,red,congruent,red,0.470939,True
19,red,blue,incongruent,blue,1.142712,True
20,red,red,congruent,red,0.656328,True
21,red,yellow,incongruent,yellow,0.978830,True
22,green,red,incongruent,red,1.316182,True
23,yellow,yellow,congruent,green,0.964292,False
24,green,green,congruent,green,0.683949,True
25,yellow,green,incongruent,green,0.583939,True
26,green,blue,incongruent,blue,1.474140,True
27,green,blue,incongruent,blue,0.569109,True
28,green,green,congruent,blue,1.196470,False
29,red,red,congruent,red,4.027546,True
30,blue,blue,congruent,blue,0.833177,True
31,red,red,congruent,red,1.019672,True
32,green,blue,incongruent,blue,0.879507,True
33,red,red,congruent,red,0.579254,True
34,red,blue,incongruent,blue,1.070518,True
35,blue,yellow,incongruent,yellow,0.723852,True
36,yellow,green,incongruent,green,0.978838,True
37,blue,blue,congruent,blue,1.038232,True
38,yellow,green,incongruent,yellow,1.366425,False
39,green,red,incongruent,red,1.066038,True
40,blue,red,incongruent,red,0.693698,True
41,red,blue,incongruent,blue,1.751062,True
42,blue,blue,congruent,blue,0.449651,True
43,green,red,incongruent,red,1.082267,True
44,blue,blue,congruent,blue,0.551023,True
45,red,blue,incongruent,blue,1.012258,True
46,yellow,green,incongruent,yellow,0.801443,False
47,blue,blue,congruent,blue,0.664119,True
48,red,green,incongruent,yellow,0.716189,False
49,green,green,congruent,yellow,0.630552,False
50,green,yellow,incongruent,yellow,0.721917,True
51,red,red,congruent,red,1.153943,True
52,blue,red,incongruent,red,0.571019,True
53,yellow,yellow,congruent,yellow,0.651611,True
54,blue,blue,congruent,blue,1.321344,True
55,green,green,congruent,green,1.159240,True
56,blue,blue,congruent,blue,0.861646,True
57,yellow,red,incongruent,red,0.793069,True
58,yellow,yellow,congruent,yellow,0.673190,True
59,yellow,red,incongruent,red,1.049320,True
60,red,yellow,incongruent,yellow,0.773447,True
61,red,yellow,incongruent,yellow,0.693554,True
62,red,red,congruent,red,0.933901,True
63,blue,blue,congruent,blue,0.726794,True
64,green,green,congruent,green,1.046116,True
65,blue,blue,congruent,blue,0.713565,True
66,blue,blue,congruent,blue,0.494177,True
67,green,green,congruent,green,0.626399,True
68,blue,blue,congruent,blue,0.711896,True
69,blue,blue,congruent,blue,0.460420,True
70,green,green,congruent,yellow,1.711978,False
71,blue,blue,congruent,blue,0.634218,True
72,yellow,blue,incongruent,yellow,0.632482,False
73,yellow,yellow,congruent,yellow,0.653813,True
74,green,green,congruent,green,0.808987,True
75,blue,blue,congruent,blue,0.647117,True
76,green,red,incongruent,red,1.791693,True
77,red,yellow,incongruent,yellow,1.482570,True
78,red,red,congruent,red,0.693132,True
79,red,yellow,incongruent,yellow,0.815830,True
80,green,green,congruent,green,0.614441,True
81,yellow,red,incongruent,red,1.080385,True
82,red,green,incongruent,green,1.198548,True
83,blue,green,incongruent,green,0.845769,True
84,yellow,blue,incongruent,blue,1.007089,True
85,green,blue,incongruent,blue,0.488701,True
86,green,green,congruent,yellow,1.858272,False
87,yellow,yellow,congruent,yellow,0.893149,True
88,yellow,yellow,congruent,yellow,0.569597,True
89,yellow,yellow,congruent,yellow,0.483542,True
90,yellow,red,incongruent,red,1.669842,True
91,blue,green,incongruent,green,1.158416,True
92,blue,red,incongruent,red,1.853055,True
93,green,yellow,incongruent,yellow,1.023785,True
94,yellow,blue,incongruent,blue,0.955395,True
95,yellow,yellow,congruent,yellow,1.303260,True
96,blue,yellow,incongruent,yellow,0.737741,True
97,yellow,green,incongruent,green,0.730972,True
98,green,red,incongruent,red,1.564596,True
99,yellow,yellow,congruent,yellow,0.978911,True
100,blue,yellow,incongruent,yellow,0.508151,True
101,red,green,incongruent,green,1.821969,True
102,red,red,congruent,red,0.818726,True
103,yellow,yellow,congruent,yellow,1.268222,True
104,yellow,yellow,congruent,yellow,0.585495,True
105,green,green,congruent,green,0.673404,True
106,blue,yellow,incongruent,yellow,1.407036,True
107,red,red,congruent,red,0.701050,True
108,red,green,incongruent,red,0.402334,False
109,red,green,incongruent,green,1.537681,True
110,green,yellow,incongruent,yellow,0.675118,True
111,green,green,congruent,green,1.004550,True
112,yellow,blue,incongruent,blue,0.627439,True
113,yellow,yellow,congruent,yellow,1.150248,True
114,blue,yellow,incongruent,yellow,0.774452,True
115,red,red,congruent,red,0.860966,True
116,red,red,congruent,red,0.499595,True
117,green,green,congruent,green,1.059725,True
118,red,red,congruent,red,0.593180,True
119,green,yellow,incongruent,yellow,0.855915,True
120,blue,green,incongruent,green,1.335018,True

但我只对 'condition'、'rt' 和 'correct' 列感兴趣。

我需要创建一个 table 来说明一致条件和不一致条件的平均反应时间，以及每个条件的正确百分比。但我想为每个参与者创建这些结果的总体 table。我的目标是得到这样的输出 table:

Participant	Stimulus Type	Mean Reaction Time	Percentage Correct
01	Congruent	0.560966	80
01	Incongruent	0.890556	64
02	Congruent	0.460576	89
02	Incongruent	0.956556	55

等所有 20 名参与者。这只是我的理想输出的一个例子，因为稍后我想绘制参与者的每个条件的均值图。但如果有人认为 table 没有意义或效率低下，我愿意接受任何建议！

我想使用 pandas，但是当每个数据帧的同一列中有两个不同的条件时，我不知道从哪里开始为每个条件找到 rt 均值？我假设我需要在某种循环中执行此操作，该循环可以运行遍历每个参与者的 csv 文件，然后将所有参与者的结果串联成 table?

最初，在努力找出我需要的循环并在网上查看之后，我运行这段代码用于连接参与者的所有数据帧，我希望这对我有所帮助一次对所有参与者进行相同的分析，但问题是它无法识别每个参与者 csv 文件中每一行的个体参与者（每个参与者有 120 行，就像我上面给出的例子），我已放入一个 table:

import os
import glob
import pandas as pd
#set working directory
os.chdir('data')

#find all csv files in the folder
#use glob pattern matching -> extension = 'csv'
#save result in list -> all_filenames
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
#print(all_filenames)

#combine all files in the list
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])
#export to csv
combined_csv.to_csv( "combined_csv.csv", index=False, encoding='utf-8-sig')

也许我可以做一些事情来添加一个参与者列来识别串联 table 中每个参与者的数据集，然后对该大串联 table 中每个参与者的两个条件执行均值和正确百分比分析=52=]? 还是先进行分析然后遍历数据帧的所有单个参与者 csv 文件会更好？

很抱歉，如果这是一个非常明显的过程，我是 python 的新手，正在尝试学习更有效地分析我的数据，一直在搜索 Internet 和 Panda 教程，但我卡住。欢迎任何帮助！我之前也从未使用过 Whosebug，如果我没有在此处正确设置格式，我深表歉意，但感谢您提供有关输入数据示例、我尝试过的代码和所需输出数据的反馈，我非常感谢您的帮助。

Answer 1

试试这个：

from pathlib import Path

# Use the Path class to represent a path. It offers more
# functionalities when perform operations on paths
path = Path("./data").resolve()

# Create a dictionary whose keys are the Participant ID
# (the `01` in `P01.csv`, etc), and whose values are
# the data frames initialized from the CSV
data = {
    p.stem[1:]: pd.read_csv(p) for p in path.glob("*.csv")
}

# Create a master data frame by combining the individual
# data frames from each CSV file
df = pd.concat(data, keys=data.keys(), names=["participant", None])

# Calculate the statistics
result = (
    df.groupby(["participant", "condition"]).agg(**{
        "Mean Reaction Time": ("rt", "mean"),
        "correct": ("correct", "sum"),
        "size": ("trialnum", "size")
    }).assign(**{
        "Percentage Correct": lambda x: x["correct"] / x["size"]
    }).drop(columns=["correct", "size"])
    .reset_index()
)

我可以在多个 csv 数据帧之间循环相同的分析，然后将每个数据帧的结果连接成一个 table 吗？

Can I loop the same analysis across multiple csv dataframes then concatenate results from each into one table?

python

csv

loops

dataframe

pandas