将 table 转换为单列值的单热编码
Transform table to one-hot-encoding of single column value
我有一个包含两列的 table:
+---------+--------+
| keyword | color |
+---------+--------+
| foo | red |
| bar | yellow |
| fobar | red |
| baz | blue |
| bazbaz | green |
+---------+--------+
我需要进行某种一次性编码并将 PostgreSQL 中的 table 转换为:
+---------+-----+--------+-------+------+
| keyword | red | yellow | green | blue |
+---------+-----+--------+-------+------+
| foo | 1 | 0 | 0 | 0 |
| bar | 0 | 1 | 0 | 0 |
| fobar | 1 | 0 | 0 | 0 |
| baz | 0 | 0 | 0 | 1 |
| bazbaz | 0 | 0 | 1 | 0 |
+---------+-----+--------+-------+------+
是否可以只用SQL?关于如何开始的任何提示?
如果我没理解错的话,你需要条件聚合:
select keyword,
count(case when color = 'red' then 1 end) as red,
count(case when color = 'yellow' then 1 end) as yellow
-- another colors here
from t
group by keyword
要在具有大量列的 table 上使用此代码,请使用 Python 生成您的查询:
1) 创建一个包含您希望作为列名的唯一变量的列表,并将其导入 Python,如:list
.
for item in list:
print('count(case when item=' +str(item)+ 'then 1 end) as is_'+str(item)+',')
2) 复制输出(减去最后一行的最后一个逗号)
3) 然后:
select keyword,
OUTPUT FROM PYTHON
from t
group by keyword
使用 tablefunc
extension and COALESCE()
to fill all NULL fields 实现测试用例目标的另一种方法:
postgres=# create table t(keyword varchar,color varchar);
CREATE TABLE
postgres=# insert into t values ('foo','red'),('bar','yellow'),('fobar','red'),('baz','blue'),('bazbaz','green');
INSERT 0 5
postgres=# SELECT keyword, COALESCE(red,0) red,
COALESCE(blue,0) blue, COALESCE(green,0) green,
COALESCE(yellow,0) yellow
FROM crosstab(
$$select keyword, color, COALESCE('1',0) as onehot from test01
group by 1, 2 order by 1, 2$$,
$$select distinct color from test01 order by 1$$)
AS result(keyword varchar, blue int, green int, red int, yellow int);
keyword | red | blue | green | yellow
---------+-----+------+-------+--------
bar | 0 | 0 | 0 | 1
baz | 0 | 1 | 0 | 0
bazbaz | 0 | 0 | 1 | 0
fobar | 1 | 0 | 0 | 0
foo | 1 | 0 | 0 | 0
(5 rows)
postgres=#
而如果你只是为了获得psql
下的结果:
postgres=# select keyword, color, COALESCE('1',0) as onehot from t
--group by 1, 2 order by 1, 2
\crosstabview keyword color
keyword | red | yellow | blue | green
---------+-----+--------+------+-------
foo | 1 | | |
bar | | 1 | |
fobar | 1 | | |
baz | | | 1 |
bazbaz | | | | 1
(5 rows)
postgres=#
我有一个包含两列的 table:
+---------+--------+
| keyword | color |
+---------+--------+
| foo | red |
| bar | yellow |
| fobar | red |
| baz | blue |
| bazbaz | green |
+---------+--------+
我需要进行某种一次性编码并将 PostgreSQL 中的 table 转换为:
+---------+-----+--------+-------+------+
| keyword | red | yellow | green | blue |
+---------+-----+--------+-------+------+
| foo | 1 | 0 | 0 | 0 |
| bar | 0 | 1 | 0 | 0 |
| fobar | 1 | 0 | 0 | 0 |
| baz | 0 | 0 | 0 | 1 |
| bazbaz | 0 | 0 | 1 | 0 |
+---------+-----+--------+-------+------+
是否可以只用SQL?关于如何开始的任何提示?
如果我没理解错的话,你需要条件聚合:
select keyword,
count(case when color = 'red' then 1 end) as red,
count(case when color = 'yellow' then 1 end) as yellow
-- another colors here
from t
group by keyword
要在具有大量列的 table 上使用此代码,请使用 Python 生成您的查询:
1) 创建一个包含您希望作为列名的唯一变量的列表,并将其导入 Python,如:list
.
for item in list:
print('count(case when item=' +str(item)+ 'then 1 end) as is_'+str(item)+',')
2) 复制输出(减去最后一行的最后一个逗号)
3) 然后:
select keyword,
OUTPUT FROM PYTHON
from t
group by keyword
使用 tablefunc
extension and COALESCE()
to fill all NULL fields 实现测试用例目标的另一种方法:
postgres=# create table t(keyword varchar,color varchar);
CREATE TABLE
postgres=# insert into t values ('foo','red'),('bar','yellow'),('fobar','red'),('baz','blue'),('bazbaz','green');
INSERT 0 5
postgres=# SELECT keyword, COALESCE(red,0) red,
COALESCE(blue,0) blue, COALESCE(green,0) green,
COALESCE(yellow,0) yellow
FROM crosstab(
$$select keyword, color, COALESCE('1',0) as onehot from test01
group by 1, 2 order by 1, 2$$,
$$select distinct color from test01 order by 1$$)
AS result(keyword varchar, blue int, green int, red int, yellow int);
keyword | red | blue | green | yellow
---------+-----+------+-------+--------
bar | 0 | 0 | 0 | 1
baz | 0 | 1 | 0 | 0
bazbaz | 0 | 0 | 1 | 0
fobar | 1 | 0 | 0 | 0
foo | 1 | 0 | 0 | 0
(5 rows)
postgres=#
而如果你只是为了获得psql
下的结果:
postgres=# select keyword, color, COALESCE('1',0) as onehot from t
--group by 1, 2 order by 1, 2
\crosstabview keyword color
keyword | red | yellow | blue | green
---------+-----+--------+------+-------
foo | 1 | | |
bar | | 1 | |
fobar | 1 | | |
baz | | | 1 |
bazbaz | | | | 1
(5 rows)
postgres=#