Vega Lite 按基于另一列的特定属性分组
Vega Lite group by specific attributes based on another column
正在尝试求属性 Gestation/Incubation(天)的平均值:
Accipitriformes, Anseriformes, Charadriiformes
属于 Aves Class。我不想在订单列中找到任何其他属性的平均值,只想找到 Aves class 之外的那些。我的数据集示例如下所示:
| Class | Order | Gestation/Incubation(days)
Amphilbia Anura 5
Amphilbia Anura 4
Amphilbia Anura 2
Amphilbia Caudata 4
Amphilbia Caudata 2
Mammalia Artiodactyla 10
Mammalia Artiodactyla 8
Mammalia Rodentia 14
Mammalia Rodentia 13
Aves Accipitriformes 12
Aves Accipitriformes 17
Aves Accipitriformes 12
Aves Anseriformes 9
Aves Anseriformes 8
Aves Anseriformes 9
Aves Charadriiformes 10
Aves Charadriiformes 12
Aves Charadriiformes 14
我能够在 Class 列中找到不同属性的平均值,例如(参见 vega-lite 演示 link):
Amphilbia, Mammalia, Aves
但我无法在 Class = Aves 的订单列中找到属性的平均值。
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {
"url": "https://raw.githubusercontent.com/cathal84/COMP40610/master/anage_data.txt",
"format": {"type": "tsv"}
},
"title": {
"text": " Average Gestation/Incubation days for Orders with Aves",
"anchor": "middle"
},
"width": 600,
"height": 600,
"transform": [
{
"aggregate": [
{"op": "average", "field": "Gestation/Incubation (days)", "as": "avg_incub"},
{"op": "count", "field": "Class", "as": "make_cnt"}
],
"groupby": ["Class"]
},
{"filter": "datum.make_cnt > 50"}
],
"mark": {"type": "bar"},
"encoding": {
"y": {
"field": "avg_incub",
"type": "quantitative",
"axis": {"title": "Average Incubation"}
},
"x": {
"field": "Class",
"type": "nominal",
"sort": {"encoding": "x", "order": "descending"},
"axis": {"title": "Orders"}
}
}
}
我试图使用过滤器功能来过滤我的数据,因此只剩下带有 Class == Aves 的数据,但这并没有解决我的问题。我一定没有正确使用它。除非他们是另一种实现我想要实现的目标的方式。
{"filter": "datum.Class == 'Aves'"}
您可以通过两个过滤步骤来完成此操作,一个是按 class 过滤,另一个是按顺序过滤。此时,在编码中使用聚合是计算分组均值的最直接方法。
这是一个例子 (vega editor):
{
"$schema": "https://vega.github.io/schema/vega-lite/v3.json",
"data": {
"url": "https://raw.githubusercontent.com/cathal84/COMP40610/master/anage_data.txt",
"format": {"type": "tsv"}
},
"title": {
"text": " Average Gestation/Incubation days for Orders with Aves",
"anchor": "middle"
},
"transform": [
{"filter": "datum.Class == 'Aves'"},
{"filter": {"field": "Order", "oneOf": ["Accipitriformes", "Anseriformes", "Charadriiformes"]}}
],
"mark": {"type": "bar"},
"encoding": {
"x": {"field": "Order", "type": "nominal"},
"y": {"field": "Gestation/Incubation (days)", "type": "quantitative", "aggregate": "mean"}
}
}
正在尝试求属性 Gestation/Incubation(天)的平均值:
Accipitriformes, Anseriformes, Charadriiformes
属于 Aves Class。我不想在订单列中找到任何其他属性的平均值,只想找到 Aves class 之外的那些。我的数据集示例如下所示:
| Class | Order | Gestation/Incubation(days)
Amphilbia Anura 5
Amphilbia Anura 4
Amphilbia Anura 2
Amphilbia Caudata 4
Amphilbia Caudata 2
Mammalia Artiodactyla 10
Mammalia Artiodactyla 8
Mammalia Rodentia 14
Mammalia Rodentia 13
Aves Accipitriformes 12
Aves Accipitriformes 17
Aves Accipitriformes 12
Aves Anseriformes 9
Aves Anseriformes 8
Aves Anseriformes 9
Aves Charadriiformes 10
Aves Charadriiformes 12
Aves Charadriiformes 14
我能够在 Class 列中找到不同属性的平均值,例如(参见 vega-lite 演示 link):
Amphilbia, Mammalia, Aves
但我无法在 Class = Aves 的订单列中找到属性的平均值。
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {
"url": "https://raw.githubusercontent.com/cathal84/COMP40610/master/anage_data.txt",
"format": {"type": "tsv"}
},
"title": {
"text": " Average Gestation/Incubation days for Orders with Aves",
"anchor": "middle"
},
"width": 600,
"height": 600,
"transform": [
{
"aggregate": [
{"op": "average", "field": "Gestation/Incubation (days)", "as": "avg_incub"},
{"op": "count", "field": "Class", "as": "make_cnt"}
],
"groupby": ["Class"]
},
{"filter": "datum.make_cnt > 50"}
],
"mark": {"type": "bar"},
"encoding": {
"y": {
"field": "avg_incub",
"type": "quantitative",
"axis": {"title": "Average Incubation"}
},
"x": {
"field": "Class",
"type": "nominal",
"sort": {"encoding": "x", "order": "descending"},
"axis": {"title": "Orders"}
}
}
}
我试图使用过滤器功能来过滤我的数据,因此只剩下带有 Class == Aves 的数据,但这并没有解决我的问题。我一定没有正确使用它。除非他们是另一种实现我想要实现的目标的方式。
{"filter": "datum.Class == 'Aves'"}
您可以通过两个过滤步骤来完成此操作,一个是按 class 过滤,另一个是按顺序过滤。此时,在编码中使用聚合是计算分组均值的最直接方法。
这是一个例子 (vega editor):
{
"$schema": "https://vega.github.io/schema/vega-lite/v3.json",
"data": {
"url": "https://raw.githubusercontent.com/cathal84/COMP40610/master/anage_data.txt",
"format": {"type": "tsv"}
},
"title": {
"text": " Average Gestation/Incubation days for Orders with Aves",
"anchor": "middle"
},
"transform": [
{"filter": "datum.Class == 'Aves'"},
{"filter": {"field": "Order", "oneOf": ["Accipitriformes", "Anseriformes", "Charadriiformes"]}}
],
"mark": {"type": "bar"},
"encoding": {
"x": {"field": "Order", "type": "nominal"},
"y": {"field": "Gestation/Incubation (days)", "type": "quantitative", "aggregate": "mean"}
}
}