如何根据数据框的图像编号列分隔文件夹中的图像？

Question

我有以下形状的数据框 (868, 3483)，其中 868 是我拥有的图像总数，3481 是图像中的像素数。每行代表一个特定的图像，图像编号在 img 列中。我应用了无监督学习并对 cluster 列中的这些图像进行了聚类。

img cluster 0 1 2 3 4 5 6 7
0   3   1.0 1.0 1.0 1.0 1.0 1.0 1.0
1   2   1.0 1.0 1.0 1.0 1.0 1.0 1.0
2   3   1.0 1.0 1.0 1.0 1.0 1.0 1.0
3   1   1.0 1.0 1.0 1.0 1.0 1.0 1.0
5   3   1.0 1.0 1.0 1.0 1.0 1.0 1.0
6   3   1.0 1.0 1.0 1.0 1.0 1.0 1.0
8   3   1.0 1.0 1.0 1.0 1.0 1.0 1.0
9   3   1.0 1.0 1.0 1.0 1.0 1.0 1.0
10  2   1.0 1.0 1.0 1.0 1.0 1.0 1.0
11  2   1.0 1.0 1.0 1.0 1.0 1.0 1.0
13  3   1.0 1.0 1.0 1.0 1.0 1.0 1.0
15  1   1.0 1.0 1.0 1.0 1.0 1.0 1.0

我有一个文件夹，其中的图像标记与 img 列相同。现在我想根据它们所属的集群来隔离这些图像。

例如图片“0,2,5,6,8,9,13”属于 cluster3，所以我想将这些图片隔离到一个名为 'cluster3' 的子文件夹中，cluster1 和 cluster2 也是如此。

有没有简单的方法可以做到这一点？

Answer 1

您可以根据 python 中的 os（或 Dennis 评论的 shutil，两者都有效）模块移动文件。据我了解，我们只关心 img 和 cluster 列

dictionary = df.set_index("img")["cluster"].to_dict() 将 return 一个字典，每个键是一个图像，每个簇是一个文件夹。我不确定存在多少个集群，但是我们也可以使用 os 命令创建多个文件夹和子文件夹，如下所示

#This is where you decide to save the images 
#Here you make individual folders for each cluster
fp = "path/to/save/images/clusters/"
import os
os.mkdir("clusters/")
allClusters = list(set(df["cluster"]))
for x in allClusters:
    os.mkdir(fp+"cluster" + str(x))

然后您可以继续将每个文件移至其相应的文件夹（我不确定每个文件的名称是什么，但现在我假设名称是 img1.png, img2.png ... 等）对于您的麻烦，我建议重命名 img 列（或其他一些列并将索引设置为下面一行中的该列）

#This is where the dictionary is created. The key to each value is the 
#original file name
#The cluster value is the folder that each image will saved two (see above
#where we create each folder
dictionary = df.set_index("img")["cluster"].to_dict()
for x in dictionary:
    #THIS is how the file is acess, the dictionary stores the name of the
    #files as the key, and path to file is the folder of all those images
    filename = "path/to/images/" + str(x) + ".png" 

    #This is where we rename the original image to the new filepath
    os.rename(filename, fp + "cluster" + str(dictionary(x)) +"/"+ filename))

这应该可以完成工作。如果有任何错误请告诉我

如何根据数据框的图像编号列分隔文件夹中的图像？

How to separate the images in a folder based on the image number column of the Data frame?

python

cluster-analysis

python-3.x

pandas