使用 bash 筛选结果

Filter results using bash

为了更清楚,请查看下面的文本文件。

https://brianbrandt.dk/web/var/www/public_html/.htpasswd
https://brianbrandt.dk/web/var/www/public_html/wp-config.php
https://briannajackson1.wordpress.org/high-entropy-misc.txt
https://briannajackson1.wordpress.org/Homestead.yaml
https://brickellmiami.centric.hyatt.com/dev
https://brickellmiami.centric.hyatt.com/django.log
https://brickellmiami.centric.hyatt.com/.dockercfg
https://brickellmiami.centric.hyatt.com/docker-compose.yml
https://brickellmiami.centric.hyatt.com/.docker/config.json
https://brickellmiami.centric.hyatt.com/Dockerfile
https://brideonashoestring.wordpress.org/web/var/www/public_html/config.php
https://brideonashoestring.wordpress.org/web/var/www/public_html/wp-config.php
https://brideonashoestring.wordpress.org/wp-config.php
https://brideonashoestring.wordpress.org/.wp-config.php.swp
https://brideonashoestring.wordpress.org/_wpeprivate/config.json
https://brideonashoestring.wordpress.org/yarn-debug.log
https://brideonashoestring.wordpress.org/yarn-error.log
https://brideonashoestring.wordpress.org/yarn.lock
https://brideonashoestring.wordpress.org/.yarnrc
https://bridgehome.adobe.com/etc/shadow
https://bridgehome.adobe.com/phpinfo.php
https://bridgetonema.wordpress.org/manifest.json
https://bridgetonema.wordpress.org/manifest.yml
https://bridge.twilio.com/.wp-config.php.swp
https://bridge.twilio.com/wp-content/themes/.git/config
https://bridge.twilio.com/_wpeprivate/config.json
https://bridge.twilio.com/yarn-debug.log
https://bridge.twilio.com/yarn-error.log
https://bridge.twilio.com/yarn.lock
https://bridge.twilio.com/.yarnrc
https://brightside.mtn.co.za/config.lua
https://brightside.mtn.co.za/config.php
https://brightside.mtn.co.za/config.php.txt
https://brightside.mtn.co.za/config.rb
https://brightside.mtn.co.za/config.ru
https://brightside.mtn.co.za/_config.yml
https://brightside.mtn.co.za/console
https://brightside.mtn.co.za/.credentials
https://brightside.mtn.co.za/CVS/Entries
https://brightside.mtn.co.za/CVS/Root
https://brightside.mtn.co.za/dasbhoard/
https://brightside.mtn.co.za/data
https://brightside.mtn.co.za/data.txt
https://brightside.mtn.co.za/db/dbeaver-data-sources.xml
https://brightside.mtn.co.za/db/dump.sql
https://brightside.mtn.co.za/db/.pgpass
https://brightside.mtn.co.za/db/robomongo.json
https://brightside.mtn.co.za/README.txt
https://brightside.mtn.co.za/RELEASE_NOTES.txt
https://brightside.mtn.co.za/.remote-sync.json
https://brightside.mtn.co.za/Resources.zip.manifest
https://brightside.mtn.co.za/.rspec
https://br.infinite.sx/db/dump.sql
https://br.infinite.sx/graphiql

域名brightside.mtn.co.za等重复10次以上的域名现在要去掉brightside.mtn.co.za等重复10次以上的域名然后输出结果输出应该长得像。

https://br.infinite.sx/db/dump.sql
https://br.infinite.sx/graphiql
https://bridgetonema.wordpress.org/manifest.json
https://bridgetonema.wordpress.org/manifest.yml

[以下为对原题的回复,以JSON输入为前提。]

由于您需要对一组中的项目进行计数,因此您会发现 group_by( sub("/[^/]*$";"") ) 很有用。

例如,如果您想完全忽略大组,正如对规定要求的一种解释似乎暗示的那样,您可以使用以下过滤器:

[.results[] | select(.status==301) | .url]
| group_by( sub("/[^/]*$";"") )
| map(select(length < 10) )
| .[][]

如果文本输入在 input.txt 中,那么在 bash 命令行中使用 jq 的一种解决方案是:

< input.txt jq -Rr '[inputs] 
  | group_by( sub("/[^/]*$";"") )
  | map(select(length < 10) )
  | .[][]'

(如果您希望输出为 JSON 字符串,请省略 -r 选项。)

更高效的解决方案

上述解决方案使用内置过滤器 group_by/1,因此效率有些低。对于非常大量的输入行,更有效的解决方案是:

< input.txt jq -Rr '
  def GROUPS_BY(stream; f):
    reduce stream as $x ({}; .[$x|f] += [$x] ) | .[] ;

  GROUPS_BY(inputs;  sub("/[^/]*$";""))
  | select(length < 10) 
  | .[]'