如何根据使用单热编码的数据设置标志值

Question

我有一个由三个表组成的数据库，如下所示：

我想用那个数据库在R中做一个机器学习模型，我需要的数据是这样的：

我可以使用一种热编码将分类变量从 t_pengolahan（例如 "Pengupasan, Fermentasi, etc"）转换为属性。但是，如何根据上面的 "result (using SQL query)" 数据将标志（是或否）设置为数据值？

Answer 1

这对我来说似乎不清楚。 "how to set flag (yes or no) to the data value based on " 结果（使用 SQL 查询）“数据”是什么意思？您要将其中一列转换为布尔值吗？如果是这样，您需要指定决策规则。这可能看起来像这样：

SELECT (... other columns),
CASE case_expression
     WHEN when_expression_1 THEN 'yes'
     WHEN when_expression_2 THEN 'no'
     ELSE '' 
END

帮助别人帮助你： - 您使用哪种 SQL 变体？（例如 sqlite 解决方案是否适合您？） - 提供您 table 创建的 sql 脚本，以及 "use one hot encoding to convert categorical variable from t_pengolahan (such as "Pengupasan、Fermentasi 等的脚本") 到属性"

Answer 2

我们可以把之前相关问题的两个答案合并起来，每一个都提供了一半的答案；找到这些答案 and here:

library(dplyr) ## dplyr and tidyr loaded for wrangling
library(tidyr)
options(dplyr.width = Inf) ## we want to show all columns of result
yes_fun <- function(x) { ## helps with pivot_wider() below
    if ( length(x) > 0 ) {
        return("yes")
    }
}
sql_result %>%
    separate_rows(pengolahan) %>% ## add rows for unique words in pengolahan
    pivot_wider(names_from = pengolahan, ## spread to yes/no indicators
                values_from = pengolahan,
                values_fill = list(pengolahan = "no"),
                values_fn = list(pengolahan = yes_fun))

数据

id_pangan  <- 1:3
kategori   <- c("Daging", "Buah", "Susu")
pengolahan <- c("Penggilingan, Perebusan", "Pengupasan",
                "Fermentasi, Sterilisasi")
batas      <- c(100, 50, 200)
sql_result <- data.frame(id_pangan, kategori, pengolahan, batas)

# A tibble: 3 x 8
  id_pangan kategori batas Penggilingan Perebusan Pengupasan
      <int> <fct>    <dbl> <chr>        <chr>     <chr>     
1         1 Daging     100 yes          yes       no        
2         2 Buah        50 no           no        yes       
3         3 Susu       200 no           no        no        
  Fermentasi Sterilisasi
  <chr>      <chr>      
1 no         no         
2 no         no         
3 yes        yes

如何根据使用单热编码的数据设置标志值

How to set flag value based on data that use one-hot-encoding

sql

r

machine-learning

one-hot-encoding

数据