CouchDB 视图 - 对键数组进行过滤和分组

CouchDB View - Filter and Group By on Key Array

问题描述

我在 CouchDB 视图中有一个键数组,[doc.time, doc.address]。两者都不是独一无二的。 doc.time 是一个 UNIX 时间戳,doc.address 是一个字符串。 reduce 函数设置为 _sum,因为每组键的唯一值是一个数字。

我想要的是按 doc.time 筛选,然后按 doc.address 对剩余记录进行分组。如果我将 doc.time 作为第一个键,无论我将什么指定为 group_level,我似乎都无法按唯一地址进行分组。如果我把 doc.address 放在第一位,我似乎无法按时间过滤查询。

两个例子

查询:?group_level=1&startkey=[0,1230000000]&endkey=[{},1340000000]

第一个键:doc.addressdoc.time

之前

问题:不按时间过滤

代码:

rows: [
  {
    key: [ "1126GDuGLQTX3LFHHmjCctdn8WKDjn7QNA" ],
    value: 50
  },
  {
    key: [ "112AobLhjLJQ3LGqXFrsdnWMPqWCQqoiS6" ],
    value: 50
  }
]

查询:?group_level=1&startkey=[1230000000]&endkey=[1340000000,{}]

第一个键:doc.timedoc.address

之前

问题:看不到并且我没有按 doc.address

分组

代码:

rows: [
  {
    key: [ 1231469665 ],
    value: 50
  },
  {
    key: [ 1231469744 ],
    value: 50
  }
]

你提到过:

... If I put doc.time as the first key, I cannot seem to group by unique addresses no matter what I specify as a group_level ...

查询参数group_level=N在第N<sup>th</sup>逗号处拆分字符串,并通过字符串匹配将左侧元素组合在一起.因此,当你的array key是这样的:[doc.time, doc.address],你将无法按address分组,它不在逗号.

... If I put doc.address first, I cannot seem to filter the query by time ...

当您的 array key 类似于:[doc.address, doc.time],请注意您在 Map 函数中发出了一个 array key .您需要考虑关于 CouchDB 中的 数组键 复合键 以下几点


描述于this reference

... First thing of note and very important ... an array output ... from the javascript Map function ... each of those Index Keys are strings, and are ordered character by character as strings, including the brackets and commas ...

以上关于the reference的陈述和解释对CouchDB索引在复合键数组键[的情况下的工作方式有重大影响。 =117=].

为了澄清,让我们在 sample 数据库上创建如下文档:

{"time":"2011","address":"CT"}
{"time":"2012","address":"CT"}
...
{"time":"2011","address":"TX"}
...
{"time":"2015","address":"TX"}
...
{"time":"2014","address":"NY"}
...
{"time":"2014","address":"CA"}
{"time":"2015","address":"CA"}
{"time":"2016","address":"CA"}

我实现了这样一个查看地图的功能:

function (doc) {
  if(doc.time && doc.address){
    emit([doc.address, doc.time], null);
  }
}

现在,我没有使用任何 Reduce 函数,因为,让我们忽略任何 groupingreducing 并专注于简单的 索引 。上面的视图正在生成以下 key/value 对用于索引:

$ curl -k -X GET 'https://admin:****@192.168.1.106:6984/sample/_design/by_addr_time/_view/by_addr_time'
{"total_rows":25,"offset":0,"rows":[
{"id":"doc_0022","key":["CA","2014"],"value":null},
{"id":"doc_0023","key":["CA","2015"],"value":null},
{"id":"doc_0024","key":["CA","2016"],"value":null},
{"id":"doc_0000","key":["CT","2011"],"value":null},
{"id":"doc_0001","key":["CT","2012"],"value":null},
{"id":"doc_0002","key":["CT","2013"],"value":null},
{"id":"doc_0003","key":["CT","2014"],"value":null},
{"id":"doc_0004","key":["CT","2015"],"value":null},
{"id":"doc_0005","key":["CT","2016"],"value":null},
{"id":"doc_0014","key":["NY","2011"],"value":null},
{"id":"doc_0015","key":["NY","2012"],"value":null},
{"id":"doc_0016","key":["NY","2013"],"value":null},
{"id":"doc_0017","key":["NY","2014"],"value":null},
{"id":"doc_0018","key":["NY","2015"],"value":null},
{"id":"doc_0019","key":["NY","2016"],"value":null},
{"id":"doc_0020","key":["NY","2017"],"value":null},
{"id":"doc_0021","key":["NY","2018"],"value":null},
{"id":"doc_0006","key":["TX","2011"],"value":null},
{"id":"doc_0008","key":["TX","2012"],"value":null},
{"id":"doc_0007","key":["TX","2013"],"value":null},
{"id":"doc_0009","key":["TX","2014"],"value":null},
{"id":"doc_0010","key":["TX","2015"],"value":null},
{"id":"doc_0011","key":["TX","2016"],"value":null},
{"id":"doc_0012","key":["TX","2017"],"value":null},
{"id":"doc_0013","key":["TX","2018"],"value":null}
]}

现在,我将执行查询以按 doc.time 筛选视图。我的查询参数是:

?startkey=["AA","2017"]&endkey=["ZZ","2018"]

我希望上面的查询 return 只有 time 字段在 20172018 之间的文档,这些文档的 address 字段可以有 any 值,因为我指定了从 AAZZ 的值,其中包括我数据库中的所有地址。我正在使用 curl 进行查询,如下所示:

$ curl -k -X GET 'https://admin:****@192.168.1.106:6984/sample/_design/by_addr_time/_view/by_addr_time?startkey=\["AA","2017"\]&endkey=\["ZZ","2018"\]'
{"total_rows":25,"offset":0,"rows":[
{"id":"doc_0022","key":["CA","2014"],"value":null},
{"id":"doc_0023","key":["CA","2015"],"value":null},
{"id":"doc_0024","key":["CA","2016"],"value":null},
{"id":"doc_0000","key":["CT","2011"],"value":null},
{"id":"doc_0001","key":["CT","2012"],"value":null},
{"id":"doc_0002","key":["CT","2013"],"value":null},
{"id":"doc_0003","key":["CT","2014"],"value":null},
{"id":"doc_0004","key":["CT","2015"],"value":null},
{"id":"doc_0005","key":["CT","2016"],"value":null},
{"id":"doc_0014","key":["NY","2011"],"value":null},
{"id":"doc_0015","key":["NY","2012"],"value":null},
{"id":"doc_0016","key":["NY","2013"],"value":null},
{"id":"doc_0017","key":["NY","2014"],"value":null},
{"id":"doc_0018","key":["NY","2015"],"value":null},
{"id":"doc_0019","key":["NY","2016"],"value":null},
{"id":"doc_0020","key":["NY","2017"],"value":null},
{"id":"doc_0021","key":["NY","2018"],"value":null},
{"id":"doc_0006","key":["TX","2011"],"value":null},
{"id":"doc_0008","key":["TX","2012"],"value":null},
{"id":"doc_0007","key":["TX","2013"],"value":null},
{"id":"doc_0009","key":["TX","2014"],"value":null},
{"id":"doc_0010","key":["TX","2015"],"value":null},
{"id":"doc_0011","key":["TX","2016"],"value":null},
{"id":"doc_0012","key":["TX","2017"],"value":null},
{"id":"doc_0013","key":["TX","2018"],"value":null}
]}

上述查询 return 的响应似乎 令人震惊 。因为看起来它没有 return 只有在 20172018 之间提交的带有 time 的文档。这就是 数组键 的 CouchDB 索引的工作原理。 CouchDB 对 数组键 进行索引,就好像整个数组是一个字符串,包括数组的括号和逗号! 如果您阅读 the reference,它会开始有意义。

现在让我们更改查询:

?startkey=["CT","2016"]&endkey=["TX","2011"]

上面查询的结果如下所示,根据我们的解释,这应该是有道理的:

$ curl -k -X GET 'https://admin:****@192.168.1.106:6984/sample/_design/by_addr_time/_view/by_addr_time?startkey=\["CT","2016"\]&endkey=\["TX","2011"\]'
{"total_rows":25,"offset":8,"rows":[
{"id":"doc_0005","key":["CT","2016"],"value":null},
{"id":"doc_0014","key":["NY","2011"],"value":null},
{"id":"doc_0015","key":["NY","2012"],"value":null},
{"id":"doc_0016","key":["NY","2013"],"value":null},
{"id":"doc_0017","key":["NY","2014"],"value":null},
{"id":"doc_0018","key":["NY","2015"],"value":null},
{"id":"doc_0019","key":["NY","2016"],"value":null},
{"id":"doc_0020","key":["NY","2017"],"value":null},
{"id":"doc_0021","key":["NY","2018"],"value":null},
{"id":"doc_0006","key":["TX","2011"],"value":null}
]}

更新

... What I want is to filter by doc.time, then group the remaining records by doc.address ...

那么,我们该怎么办呢?有个不错的question and answer,提供了基本思路。

不确定哪个想法最好,但我实现了一个这样的想法:创建了一个名为 t_red 的视图,如下所示,带有一个内置 _count减少:

function (doc) {
  if(doc.time && doc.address){
    emit([doc.time, doc.address], null);
  }
}

此外,我创建了一个名为 a_red 的视图,其中包含一个内置的 _count reduce:

function (doc) {
  if(doc.address && doc.time){
    emit([doc.address, doc.time], null);
  }
}

然后我在NodeJS上开发了以下代码来查询20122015之间的doc.time,然后根据结果分组doc.address,控制台日志在代码中显示为注释。我希望这段代码会有所帮助(不要混淆!):

process.env.NODE_TLS_REJECT_UNAUTHORIZED = "0"; // Ignore rejection, becasue CouchDB SSL certificate is self-signed

const fetch=require('node-fetch')

// query "t_red" view/index
fetch(`https://admin:****@192.168.1.106:6984/sample/_design/t_red/_view/t_red?group_level=2&startkey=["2012", "AA"]&endkey=["2015", "ZZ"]`, {
    method: 'GET',
    headers: {
        'Content-Type': 'application/json',
    }
}).then(
    res=>res.json()
).then(data=>{
    let unique_addr=[]
    data.rows.map(row=>{
        console.log('row.key-> ', row.key, '  row.value-> ', row.value)

        // console log is shown below:
        //
        // row.key->  [ '2012', 'CT' ]   row.value->  1
        // row.key->  [ '2012', 'NY' ]   row.value->  1
        // row.key->  [ '2012', 'TX' ]   row.value->  1
        // row.key->  [ '2013', 'CT' ]   row.value->  1
        // row.key->  [ '2013', 'NY' ]   row.value->  1
        // row.key->  [ '2013', 'TX' ]   row.value->  1
        // row.key->  [ '2014', 'CA' ]   row.value->  1
        // row.key->  [ '2014', 'CT' ]   row.value->  1
        // row.key->  [ '2014', 'NY' ]   row.value->  1
        // row.key->  [ '2014', 'TX' ]   row.value->  1
        // row.key->  [ '2015', 'CA' ]   row.value->  1
        // row.key->  [ '2015', 'CT' ]   row.value->  1
        // row.key->  [ '2015', 'NY' ]   row.value->  1
        // row.key->  [ '2015', 'TX' ]   row.value->  1

        if(unique_addr.indexOf(row.key[1])==-1){ // Push unique addresses into an array
            unique_addr.push(row.key[1])
        }
    })

    console.log(unique_addr)

    // Console log is shown below:
    //
    // [ 'CT', 'NY', 'TX', 'CA' ]

    return unique_addr

}).then(unique_addr=>{

    // Group the unique addresses
    let group_by_address=unique_addr.map(addr=>{
        // For each unique address, do a query of "a_red" view/index
        return fetch(`https://admin:****@192.168.1.106:6984/sample/_design/a_red/_view/a_red?group_level=2&startkey=["${addr}","2012"]&endkey=["${addr}","2015"]`, {
            method: 'GET',
            headers: {
                'Content-Type': 'application/json',
            }
        }).then(
            res=>res.json()
        ).then(data=>{
            data.rows.map(row=>{console.log('row.key-> ', row.key, '  row.value-> ', row.value)})

            // Console logs related to this section of code are shown below

            //row.key->  [ 'CA', '2014' ]   row.value->  1
            //row.key->  [ 'CA', '2015' ]   row.value->  1

            //row.key->  [ 'NY', '2012' ]   row.value->  1
            //row.key->  [ 'NY', '2013' ]   row.value->  1
            //row.key->  [ 'NY', '2014' ]   row.value->  1
            //row.key->  [ 'NY', '2015' ]   row.value->  1

            //row.key->  [ 'CT', '2012' ]   row.value->  1
            //row.key->  [ 'CT', '2013' ]   row.value->  1
            //row.key->  [ 'CT', '2014' ]   row.value->  1
            //row.key->  [ 'CT', '2015' ]   row.value->  1

            //row.key->  [ 'TX', '2012' ]   row.value->  1
            //row.key->  [ 'TX', '2013' ]   row.value->  1
            //row.key->  [ 'TX', '2014' ]   row.value->  1
            //row.key->  [ 'TX', '2015' ]   row.value->  1

            let obj={}
            obj[addr]=data.rows.length // This object contains unique address and its corresponding frequency in above query
            return obj

        }).catch(err=>{
            console.log('err-> ', err)
        })
    })

    return group_by_address

}).then(group_by_address=>{
    group_by_address.map(group=>{
        group.then(()=>{
            console.log('Grouped by address-> ', group)

            // Console logs related this section of code are shown below:

            //Grouped by address->  Promise { { CA: 2 } }

            //Grouped by address->  Promise { { NY: 4 } }

            //Grouped by address->  Promise { { CT: 4 } }

            //Grouped by address->  Promise { { TX: 4 } }
        })
    })
}).catch(err=>{
    console.log('err-> ', err)
})