独立 3 列的 Julia clustering.jl

Question

我有一个包含 3 列且只有连续数据的数据框。我想对它们进行聚类，但不是通常的方式（这是为了用我制作的模型测试性能）。相反，我想独立地对列进行聚类，然后用它进行计算（我会在后面解释）

通常我会这样做:

X = rand(3, 1000)
r = kmeans(X, 3; maxiter=200)

就是这样。

但在这里我想：

for  each columns
     r = kmeans(X[thecolumns], 3; maxiter=200)

让 3 列簇彼此独立

函数 kmeans 要求一个 AbstractMatrix ，看来我不能给它一个一维数组。请问我该怎么做？

谢谢

Answer 1

使用 eachcol 函数，然后在每一列上使用 permutedims，如下所示：

10×3 DataFrame
│ Row │ x1        │ x2       │ x3        │
│     │ Float64   │ Float64  │ Float64   │
├─────┼───────────┼──────────┼───────────┤
│ 1   │ 0.207498  │ 0.868425 │ 0.685724  │
│ 2   │ 0.749254  │ 0.687906 │ 0.0772998 │
│ 3   │ 0.834508  │ 0.184708 │ 0.647621  │
│ 4   │ 0.0645001 │ 0.172216 │ 0.133732  │
│ 5   │ 0.214247  │ 0.157832 │ 0.760743  │
│ 6   │ 0.235033  │ 0.988136 │ 0.197389  │
│ 7   │ 0.679289  │ 0.356979 │ 0.868345  │
│ 8   │ 0.276346  │ 0.733741 │ 0.49799   │
│ 9   │ 0.254432  │ 0.480508 │ 0.465863  │
│ 10  │ 0.771157  │ 0.764749 │ 0.692134  │

julia> rs = map(col -> kmeans(permutedims(col), 3, maxiter=200), eachcol(df))
3-element Array{KmeansResult{Array{Float64,2},Float64,Int64},1}:
 KmeansResult{Array{Float64,2},Float64,Int64}([0.23751121600553188 0.7585518673590035 0.06450009603113083], [1, 2, 2, 3, 1, 1, 2, 1, 1, 2], [0.0009008220520715787, 8.644942657287658e-5, 0.005769294511402512, 0.0, 0.0005412131030494011, 6.140057031367441e-6, 0.006282650149634916, 0.0015081594831378164, 0.0002863056813391107, 0.00015888896049953694], [5, 4, 1], [5, 4, 1], 0.015539923424739116, 2, true)
 KmeansResult{Array{Float64,2},Float64,Int64}([0.2179339668173827 0.9282804966386198 0.6667262902233205], [2, 3, 1, 1, 1, 2, 1, 3, 3, 3], [0.0035826329649959465, 0.00044858356501908947, 0.0011039681597532075, 0.0020901229894580015, 0.003612197823712318, 0.0035826329649959465, 0.019333634884288103, 0.004491026838137957, 0.03467709461536661, 0.009608499373310986], [4, 2, 4], [4, 2, 4], 0.08253039417903817, 2, true)
 KmeansResult{Array{Float64,2},Float64,Int64}([0.7309133142778247 0.13614011534786297 0.48192651082237836], [1, 2, 1, 2, 1, 2, 1, 3, 3, 1], [0.0020420816188950752, 0.0034621827135169585, 0.006937609017906632, 5.801293569009103e-6, 0.0008898099287693029, 0.0037514280990030707, 0.01888739640456283, 0.00025803289581327604, 0.00025803289581327604, 0.001503841748597079], [5, 3, 2], [5, 3, 2], 0.037996216616446504, 2, true)

我使用 map 自动收集向量中每列的聚类结果。

或者您可以使用广播：

julia> kmeans.(permutedims.(eachcol(df)), 3, maxiter=200)
3-element Array{KmeansResult{Array{Float64,2},Float64,Int64},1}:
 KmeansResult{Array{Float64,2},Float64,Int64}([0.06450009603113083 0.7585518673590035 0.23751121600553188], [3, 2, 2, 1, 3, 3, 2, 3, 3, 2], [0.0009008220520715787, 8.644942657287658e-5, 0.005769294511402512, 0.0, 0.0005412131030494011, 6.140057031367441e-6, 0.006282650149634916, 0.0015081594831378164, 0.0002863056813391107, 0.00015888896049953694], [1, 4, 5], [1, 4, 5], 0.015539923424739116, 2, true)
 KmeansResult{Array{Float64,2},Float64,Int64}([0.8387629127432272 0.2179339668173827 0.5842072515988065], [1, 3, 2, 2, 2, 1, 2, 1, 3, 1], [0.0008798629662141177, 0.010753447354678758, 0.0011039681597532075, 0.0020901229894580015, 0.003612197823712318, 0.022312198616715184, 0.019333634884288103, 0.011029515161705694, 0.010753447354678758, 0.0054780232316526956], [4, 4, 2], [4, 4, 2], 0.08734641854285684, 3, true)
 KmeansResult{Array{Float64,2},Float64,Int64}([0.13614011534786297 0.5978663716196664 0.8145438674677745], [2, 1, 2, 1, 3, 1, 3, 2, 2, 2], [0.007718948294718642, 0.0034621827135169585, 0.0024755234035849227, 5.801293569009103e-6, 0.002894533739809191, 0.0037514280990030707, 0.0028945337398089688, 0.009975306947276774, 0.017424861487738807, 0.008886369880211853], [3, 5, 2], [3, 5, 2], 0.05948948959923819, 2, true)

独立 3 列的 Julia clustering.jl

Julia clustering.jl of 3 columns independently

cluster-analysis

dataframe

julia