R PCA 生成的图形有问题，无法识别原因

Question

我在使用此 PCA 时遇到问题。 PC1 结果显示为二进制，我无法弄清楚为什么 none 我的变量是二进制的。

df = 蜜蜂

pca_dat_condition <- bees %>% ungroup() %>%
  select(Length.1:Length.25, OBJECTID, Local, Elevation, Longitude, 
  Latitude, Cubital.Index)   %>% 
  na.omit()

pca_dat_first <- pca_dat_condition %>%      #remove the final nonnumerical information 
  select(-Local, -OBJECTID, -Elevation, -Longitude, -Latitude) 

pca <- pca_dat_first%>%   
  scale()  %>%
  prcomp()

# add identifying information back into PCA data
pca_data <- data.frame(pca$x, Local=pca_dat_condition$Local, ID = 
pca_dat_condition$OBJECTID, elevation = pca_dat_condition$Elevation, 
    Longitude = pca_dat_condition$Longitude, Latitude = 
    pca_dat_condition$Latitude)
ggplot(pca_data, aes(x=PC1, y=PC2, color = Latitude)) + 
   geom_point() +ggtitle("PC1 vs PC2: All Individuals") +
   scale_colour_gradient(low = "blue", high = "red")

我没有收到任何有关代码的错误消息，而且当我查看数据框时，没有任何异常。我应该为 PCA 使用不同的功能吗？对为什么我的图表可能看起来像这样有任何见解吗？

之前，我进行了相同的 PCA，但针对的是每个 Local 的平均值（而这是每个个体），结果是没有明确聚类的普通 PCA。我不明白为什么在查看个别点时会出现这个问题。有可能我以一种奇怪的方式合并了一些其他数据框，但数据集的结构似乎完全正常。

This is how the PCA looks.

Answer 1

bees <- read.csv(paste0("https://gist.githubusercontent.com/AkselA/", 
                    "08a4e78a6a29a918ed597e9a32adc228/raw/", 
                    "6d0005fad4cb91830bcf7087176283b18683e9cd/bees.csv"), 
                    header=TRUE)

# bees <- bees[bees[,1] < 10,]  # This will remove the three offending rows
bees <- na.omit(bees)

bees.cond <- bees[, grep("Length|OBJ|Loc|Ele|Lon|Lat|Cubi", colnames(bees))]

bees.first <- bees[, grep("Length|Cubi", colnames(bees))]
summary(bees.first)
par(mfrow=c(7, 4), mar=rep(1, 4))
q <- lapply(1:ncol(bees.first), function(x) {
    h <- hist(scale(bees.first[, x]), plot=FALSE)
    h$counts <- log1p(h$counts)
    plot(h, main="", axes=FALSE, ann=FALSE)
    legend("topright", legend=names(bees.first[x]), 
      bty="n", cex=0.8, adj=c(0, -2), xpd=NA)
    })

bees.pca <- prcomp(bees.first, scale.=TRUE)
biplot(bees.pca)

R PCA 生成的图形有问题，无法识别原因

R PCA makes graph that is fishy, can't ID why

r

ggplot2

pca

去除异常值之前

之后