使用 TRUE/FALSE 标记在 Postgresql 中进行透视
Pivot in Postgresql with TRUE/FALSE markings
我想知道如何将多个数组值放入列名中,具有 TRUE/FALSE 个值。
我给你举个具体的例子:
我有的是重复的行,最后一列由于不同的结果而重复:
DATE ID Species Illness Tag
20180101 001 Dog Asthma Mucus
20180101 001 Dog Asthma Noisy
20180101 001 Dog Asthma Respiratory
20180102 002 Cat Osteoarthritis Locomotor
20180102 002 Cat Osteoarthritis Limp
...
20180131 003 Bird Avian Pox Itchy
我想得到的是:
DATE ID Species Illness Mucus Noisy ... Limp Itchy
20180101 001 Dog Asthma TRUE TRUE ... FALSE FALSE
20180102 002 Cat Osteoarth. FALSE FALSE ... TRUE FALSE
...
20180131 003 Bird Avian Pox FALSE FALSE ... FALSE TRUE
我尝试使用 "crosstab" 功能只是为了标签的一部分,但它给我错误的不存在的功能:
select *
from crosstab (
'select c.id, tg."name"
FROM taggings t
join consultations c
on c.id=t.taggable_id
join tags tg
on t.tag_id=tg.id
group by c.id, tg."name"'
) as final_result(dermatological BOOLEAN, behaviour BOOLEAN)
顺便说一句。我有大约 350 个标签,所以它不是最佳功能:/
编辑:
最后,我添加了 tablefunc 扩展,并尝试使用 crosstab(),但出现以下错误:
Query execution failed Reason: SQL Error [22023]: ERROR: invalid
source data SQL statement Detail: The provided SQL must return 3
columns: rowid, category, and values.
我会尝试找到解决方案并在此处更新,但与此同时,如果有人知道如何解决,请分享:) 谢谢!
经过几天的阅读和尝试建议的解决方案,这对我有用:
我所做的是获取 3 个单独的 tables,然后加入第一个和第三个以获取我需要的信息,如果标签存在,则将标签作为值为 1/0 的列在某个ID中。
再编辑一次 => 我实际上并不需要日期,所以我根据咨询的 ID tables。
TABLE 1:
获取一个 table 所有你需要按 ID 分组的列,并获取一个 ID 拥有的所有标签。
ID Species Age Illness Tag
001 Dog 2 Asthma Mucus
001 Dog 2 Asthma Noisy
001 Dog 2 Asthma Respiratory
002 Cat 5 Osteoarthritis Locomotor
002 Cat 5 Osteoarthritis Limp
...
003 Bird 1 Avian Pox Itchy
TABLE 2:
获取将与所有不同标签的列表交叉所有协商的笛卡尔积,并为 crosstab() 函数排序它们。
(交叉表函数需要有 3 列;ID、标签和值)
With consultation_tags as
(here put the query of the TABLE 1),
tag_list as
(select tags."name"
from tags
join taggings t on t.tag_id = tags.id
join consultations c on c.id = t.taggable_id a
group by 1), —-> gets the list of all possible tags in the DB
cartesian_consultations_tags as
(select consultations_tags.id, tag_list.name,
case when tag_list.name = consultations_tags.tag_name then 1
else 0 --> "case" gets the value 1/0 if the tag is present in an ID
end as tag_exists
from
consultations_tags
cross join
tag_list)
select cartesian_consul_tags.id, cartesian_consul_tags.name,
SUM(cartesian_consul_tags.tag_exists) --> for me, the values were duplicated, and so were tags
from cartesian_consul_tags
group by 1, 2
order by 1, 2
—> 标签的顺序在这里非常重要,因为你是在交叉表函数中命名列的人;它不会将某个标签转换为列,它只会传输该标签位置的值,因此如果您打乱命名顺序,这些值将无法正确对应。
TABLE 3:
第二个 table 的交叉表 —> 它以笛卡尔积 table 为中心,或者在这种情况下 TABLE 2.
SELECT *
FROM crosstab(‘ COPY THE TABLE 2 ‘) --> if you have some conditions like “where species = ‘Dogs’”, you will need to put double apostrophe in the string value —> where species = ‘’Dogs’’
AS ct(id int4,”Itchy” int8,
“Limp” int8,
“Locomotor” int8,
“Mucus” int8,
“Noisy” int8) --> your tag list. You can prepare it in excel, so all the tags are in quotation marks and has corresponding datatype. The datatype of the tags has to be the same as the datatype of the “value” in the table 2
FINALLY,我最后想要的table是加入tables 1和3,所以我从咨询ID中得到了我需要的信息, 如果标签出现在某个咨询中,则标签列表作为值为 0/1 的列。
with table1 as ( Copy the query of table1),
table3 as ( Copy the query of table3)
select *
from table1
join table3 on
table1.id=table3.id
order by 1
最后的 table 看起来像这样:
ID Species Illness Mucus Noisy ... Limp Itchy
001 Dog Asthma 1 1 ... 0 0
002 Cat Osteoarth. 0 0 ... 1 0
...
003 Bird Avian Pox 0 0 ... 0 1
我做了一些实验,这就是我想出的。
# Reading the data into a table
SELECT * INTO crosstab_test FROM
(VALUES (20180101,'001','Dog','Asthma','Mucus'),
(20180101,'001','Dog','Asthma','Noisy'),
(20180101,'001','Dog','Asthma','Respiratory'),
(20180102,'002','Cat','Osteoarthritis','Locomotor'),
(20180102,'002','Cat','Osteoarthritis','Limp'),
(20180131, '003', 'Bird', 'Avian Pox','Itchy')) as a (date, id, species, illness, tag);
SELECT DISTINCT date, id, species, illness, mucus, noisy, locomotor, respiratory, limp, itchy
FROM
(SELECT "date", id, species, illness
FROM crosstab_test) a
INNER JOIN
(SELECT * FROM crosstab(
'SELECT id, tag, ''TRUE'' FROM crosstab_test ORDER BY 1,2,3',
'SELECT DISTINCT tag FROM crosstab_test ORDER BY 1')
as tabelle (id text, Itchy text, Limp text, Locomotor text, Mucus text, Noisy text, Respiratory text)) b
USING(id)
ORDER BY 1;
date | id | species | illness | mucus | noisy | locomotor | respiratory | limp | itchy
----------+-----+---------+----------------+-------+-------+-----------+-------------+------+-------
20180101 | 001 | Dog | Asthma | TRUE | TRUE | | TRUE | |
20180102 | 002 | Cat | Osteoarthritis | | | TRUE | | TRUE |
20180131 | 003 | Bird | Avian Pox | | | | | | TRUE
(3 Zeilen)
如果您不关心列的顺序,您可以这样做 SELECT DISTINCT * ...
考虑到您所说的 350 个标签,用 FALSE
替换 NULL
可能会有点困难。所以我建议离开他们。如果你确实想要它们,你可以做 SELECT DISTINCT date, id, species, illness, COALESCE(mucus, 'FALSE'), COALESCE(noisy, 'FALSE'),...
然而,您将不得不吞下的苦果是在交叉表语句的 as the tabelle (id text, Itchy text, Limp text, Locomotor text, Mucus text, Noisy text, Respiratory text)
部分中将所有 350 个标记指定为类型为 text
的列。务必按照交叉表语句中 'SELECT DISTINCT tag FROM crosstab_test ORDER BY 1'
确定的正确顺序排列它们。
希望这就是您要找的。
根据您显示查询结果的方式,您可能会考虑一种不同的方法,即在单个 JSONB 列中获取每个标签的所有 true/false 标志,而不是 350 个动态列.
我不确定我是否正确理解了你的数据模型,但从我收集到的信息来看,我认为它是这样的:
create table tags (id int, tag text);
create table consultations (id int, species text, illness text);
create table taggings (taggable_id int, tag_id int);
insert into tags
(id, tag)
values
(1, 'Mucus'),
(2, 'Noisy'),
(3, 'Limp'),
(4, 'Itchy'),
(5, 'Locomotor'),
(6, 'Respiratory');
insert into consultations
(id, species, illness)
values
(1, 'Dog', 'Asthma'),
(2, 'Cat', 'Osteoarthritis'),
(3, 'Bird', 'Avian Pox');
insert into taggings
(taggable_id, tag_id)
values
(1, 1), (1, 2), (1, 6), -- the dog
(2, 5), (2, 3), -- the cat
(3, 4); -- the bird
然后您可以使用此查询获得单个 JSON 列:
select c.id, c.species, c.illness,
(select jsonb_object_agg(t.tag, tg.taggable_id is not null)
from tags t
left join taggings tg
on tg.tag_id = t.id
and tg.taggable_id = c.id) as tags
from consultations c;
使用上面的示例数据查询 returns:
id | species | illness | tags
---+---------+----------------+---------------------------------------------------------------------------------------------------------
1 | Dog | Asthma | {"Limp": false, "Itchy": false, "Mucus": true, "Noisy": true, "Locomotor": false, "Respiratory": true}
2 | Cat | Osteoarthritis | {"Limp": true, "Itchy": false, "Mucus": false, "Noisy": false, "Locomotor": true, "Respiratory": false}
3 | Bird | Avian Pox | {"Limp": false, "Itchy": true, "Mucus": false, "Noisy": false, "Locomotor": false, "Respiratory": false}
另一种编写查询的方法是使用横向连接:
select c.id, c.species, c.illness, ti.tags
from consultations c
left join lateral (
select jsonb_object_agg(t.tag, tg.taggable_Id is not null) as tags
from tags t
left join taggings tg on tg.tag_id = t.id and tg.taggable_id = c.id
) as ti on true
我想知道如何将多个数组值放入列名中,具有 TRUE/FALSE 个值。 我给你举个具体的例子:
我有的是重复的行,最后一列由于不同的结果而重复:
DATE ID Species Illness Tag
20180101 001 Dog Asthma Mucus
20180101 001 Dog Asthma Noisy
20180101 001 Dog Asthma Respiratory
20180102 002 Cat Osteoarthritis Locomotor
20180102 002 Cat Osteoarthritis Limp
...
20180131 003 Bird Avian Pox Itchy
我想得到的是:
DATE ID Species Illness Mucus Noisy ... Limp Itchy
20180101 001 Dog Asthma TRUE TRUE ... FALSE FALSE
20180102 002 Cat Osteoarth. FALSE FALSE ... TRUE FALSE
...
20180131 003 Bird Avian Pox FALSE FALSE ... FALSE TRUE
我尝试使用 "crosstab" 功能只是为了标签的一部分,但它给我错误的不存在的功能:
select *
from crosstab (
'select c.id, tg."name"
FROM taggings t
join consultations c
on c.id=t.taggable_id
join tags tg
on t.tag_id=tg.id
group by c.id, tg."name"'
) as final_result(dermatological BOOLEAN, behaviour BOOLEAN)
顺便说一句。我有大约 350 个标签,所以它不是最佳功能:/
编辑: 最后,我添加了 tablefunc 扩展,并尝试使用 crosstab(),但出现以下错误:
Query execution failed Reason: SQL Error [22023]: ERROR: invalid source data SQL statement Detail: The provided SQL must return 3 columns: rowid, category, and values.
我会尝试找到解决方案并在此处更新,但与此同时,如果有人知道如何解决,请分享:) 谢谢!
经过几天的阅读和尝试建议的解决方案,这对我有用:
我所做的是获取 3 个单独的 tables,然后加入第一个和第三个以获取我需要的信息,如果标签存在,则将标签作为值为 1/0 的列在某个ID中。 再编辑一次 => 我实际上并不需要日期,所以我根据咨询的 ID tables。
TABLE 1: 获取一个 table 所有你需要按 ID 分组的列,并获取一个 ID 拥有的所有标签。
ID Species Age Illness Tag
001 Dog 2 Asthma Mucus
001 Dog 2 Asthma Noisy
001 Dog 2 Asthma Respiratory
002 Cat 5 Osteoarthritis Locomotor
002 Cat 5 Osteoarthritis Limp
...
003 Bird 1 Avian Pox Itchy
TABLE 2: 获取将与所有不同标签的列表交叉所有协商的笛卡尔积,并为 crosstab() 函数排序它们。 (交叉表函数需要有 3 列;ID、标签和值)
With consultation_tags as
(here put the query of the TABLE 1),
tag_list as
(select tags."name"
from tags
join taggings t on t.tag_id = tags.id
join consultations c on c.id = t.taggable_id a
group by 1), —-> gets the list of all possible tags in the DB
cartesian_consultations_tags as
(select consultations_tags.id, tag_list.name,
case when tag_list.name = consultations_tags.tag_name then 1
else 0 --> "case" gets the value 1/0 if the tag is present in an ID
end as tag_exists
from
consultations_tags
cross join
tag_list)
select cartesian_consul_tags.id, cartesian_consul_tags.name,
SUM(cartesian_consul_tags.tag_exists) --> for me, the values were duplicated, and so were tags
from cartesian_consul_tags
group by 1, 2
order by 1, 2
—> 标签的顺序在这里非常重要,因为你是在交叉表函数中命名列的人;它不会将某个标签转换为列,它只会传输该标签位置的值,因此如果您打乱命名顺序,这些值将无法正确对应。
TABLE 3: 第二个 table 的交叉表 —> 它以笛卡尔积 table 为中心,或者在这种情况下 TABLE 2.
SELECT *
FROM crosstab(‘ COPY THE TABLE 2 ‘) --> if you have some conditions like “where species = ‘Dogs’”, you will need to put double apostrophe in the string value —> where species = ‘’Dogs’’
AS ct(id int4,”Itchy” int8,
“Limp” int8,
“Locomotor” int8,
“Mucus” int8,
“Noisy” int8) --> your tag list. You can prepare it in excel, so all the tags are in quotation marks and has corresponding datatype. The datatype of the tags has to be the same as the datatype of the “value” in the table 2
FINALLY,我最后想要的table是加入tables 1和3,所以我从咨询ID中得到了我需要的信息, 如果标签出现在某个咨询中,则标签列表作为值为 0/1 的列。
with table1 as ( Copy the query of table1),
table3 as ( Copy the query of table3)
select *
from table1
join table3 on
table1.id=table3.id
order by 1
最后的 table 看起来像这样:
ID Species Illness Mucus Noisy ... Limp Itchy
001 Dog Asthma 1 1 ... 0 0
002 Cat Osteoarth. 0 0 ... 1 0
...
003 Bird Avian Pox 0 0 ... 0 1
我做了一些实验,这就是我想出的。
# Reading the data into a table
SELECT * INTO crosstab_test FROM
(VALUES (20180101,'001','Dog','Asthma','Mucus'),
(20180101,'001','Dog','Asthma','Noisy'),
(20180101,'001','Dog','Asthma','Respiratory'),
(20180102,'002','Cat','Osteoarthritis','Locomotor'),
(20180102,'002','Cat','Osteoarthritis','Limp'),
(20180131, '003', 'Bird', 'Avian Pox','Itchy')) as a (date, id, species, illness, tag);
SELECT DISTINCT date, id, species, illness, mucus, noisy, locomotor, respiratory, limp, itchy
FROM
(SELECT "date", id, species, illness
FROM crosstab_test) a
INNER JOIN
(SELECT * FROM crosstab(
'SELECT id, tag, ''TRUE'' FROM crosstab_test ORDER BY 1,2,3',
'SELECT DISTINCT tag FROM crosstab_test ORDER BY 1')
as tabelle (id text, Itchy text, Limp text, Locomotor text, Mucus text, Noisy text, Respiratory text)) b
USING(id)
ORDER BY 1;
date | id | species | illness | mucus | noisy | locomotor | respiratory | limp | itchy
----------+-----+---------+----------------+-------+-------+-----------+-------------+------+-------
20180101 | 001 | Dog | Asthma | TRUE | TRUE | | TRUE | |
20180102 | 002 | Cat | Osteoarthritis | | | TRUE | | TRUE |
20180131 | 003 | Bird | Avian Pox | | | | | | TRUE
(3 Zeilen)
如果您不关心列的顺序,您可以这样做 SELECT DISTINCT * ...
考虑到您所说的 350 个标签,用 FALSE
替换 NULL
可能会有点困难。所以我建议离开他们。如果你确实想要它们,你可以做 SELECT DISTINCT date, id, species, illness, COALESCE(mucus, 'FALSE'), COALESCE(noisy, 'FALSE'),...
然而,您将不得不吞下的苦果是在交叉表语句的 as the tabelle (id text, Itchy text, Limp text, Locomotor text, Mucus text, Noisy text, Respiratory text)
部分中将所有 350 个标记指定为类型为 text
的列。务必按照交叉表语句中 'SELECT DISTINCT tag FROM crosstab_test ORDER BY 1'
确定的正确顺序排列它们。
希望这就是您要找的。
根据您显示查询结果的方式,您可能会考虑一种不同的方法,即在单个 JSONB 列中获取每个标签的所有 true/false 标志,而不是 350 个动态列.
我不确定我是否正确理解了你的数据模型,但从我收集到的信息来看,我认为它是这样的:
create table tags (id int, tag text);
create table consultations (id int, species text, illness text);
create table taggings (taggable_id int, tag_id int);
insert into tags
(id, tag)
values
(1, 'Mucus'),
(2, 'Noisy'),
(3, 'Limp'),
(4, 'Itchy'),
(5, 'Locomotor'),
(6, 'Respiratory');
insert into consultations
(id, species, illness)
values
(1, 'Dog', 'Asthma'),
(2, 'Cat', 'Osteoarthritis'),
(3, 'Bird', 'Avian Pox');
insert into taggings
(taggable_id, tag_id)
values
(1, 1), (1, 2), (1, 6), -- the dog
(2, 5), (2, 3), -- the cat
(3, 4); -- the bird
然后您可以使用此查询获得单个 JSON 列:
select c.id, c.species, c.illness,
(select jsonb_object_agg(t.tag, tg.taggable_id is not null)
from tags t
left join taggings tg
on tg.tag_id = t.id
and tg.taggable_id = c.id) as tags
from consultations c;
使用上面的示例数据查询 returns:
id | species | illness | tags
---+---------+----------------+---------------------------------------------------------------------------------------------------------
1 | Dog | Asthma | {"Limp": false, "Itchy": false, "Mucus": true, "Noisy": true, "Locomotor": false, "Respiratory": true}
2 | Cat | Osteoarthritis | {"Limp": true, "Itchy": false, "Mucus": false, "Noisy": false, "Locomotor": true, "Respiratory": false}
3 | Bird | Avian Pox | {"Limp": false, "Itchy": true, "Mucus": false, "Noisy": false, "Locomotor": false, "Respiratory": false}
另一种编写查询的方法是使用横向连接:
select c.id, c.species, c.illness, ti.tags
from consultations c
left join lateral (
select jsonb_object_agg(t.tag, tg.taggable_Id is not null) as tags
from tags t
left join taggings tg on tg.tag_id = t.id and tg.taggable_id = c.id
) as ti on true