根据唯一值过滤列表中的每个子域

Filter each subdomain in list based on unique value

我有两个列表或网址 首先listofdomains.txt包含如下

http://example.com
https://www.example.com
https://abc-test.example.com

urls_params.txt包含如下

http://example.com/?param1=123
http://example.com/?param1=123&param2=456
https://www.example.com/?param1=123
https://www.example.com/?param1=123&param2=456
https://abc-test.example.com/?param1=123
https://abc-test.example.com/?param1=123&param2=456

我需要在两个列表之间循环以从 urls_params.txt 所有 url 属于每个子域并用子域保存它 name.txt

例如,所需的输出将是 名为 example.com 并包含

的文件
http://example.com/?param1=123
http://example.com/?param1=123&param2=456

其余子域依此类推

我的解决方法是过滤 listofdomains.txt 列表只作为

example.com
www.example.com
abc-test.example.com

并将其保存在名为 list 的文件中 然后执行以下命令 while read -r url; do $(cat urls_params.txt | awk -v u="$url" '{print u}') ; done < list

但输出错误

example.com: command not found
www.example.com: command not found
abc-test.example.com: command not found

谢谢

找到了

while read -r url ; do cat urls_params.txt | grep -E "$url" | tee $url.txt ; done < list

输入(来自问题):

$ ls
listofdomains.txt  tst.awk  urls_params.txt

脚本:

$ cat tst.awk
{
    dom = [=11=]
    sub("https?://","",dom)
    sub("/.*","",dom)
}
NR==FNR {
    dom2urls[dom] = dom2urls[dom] [=11=] ORS
    next
}
dom != prev {
    close(out)
    out = dir "/" dom
    prev = dom
}
{ printf "%s", dom2urls[dom] > out }

执行:

$ awk -v dir="$PWD" -f tst.awk urls_params.txt listofdomains.txt

输出:

$ ls
abc-test.example.com  example.com  listofdomains.txt  tst.awk  urls_params.txt  www.example.com

$ head *.com
==> abc-test.example.com <==
https://abc-test.example.com/?param1=123
https://abc-test.example.com/?param1=123&param2=456

==> example.com <==
http://example.com/?param1=123
http://example.com/?param1=123&param2=456

==> www.example.com <==
https://www.example.com/?param1=123
https://www.example.com/?param1=123&param2=456

您实际上并不需要 listofdomains.txt,除非您希望从输出中排除某些域,或者您希望获取空输出文件的某些域未包含在 urls_params.txt 中。

如果您只想为在 urls_params.txt 文件中有条目的域创建输出文件(即没有空输出文件),那么只需更改:

{ printf "%s", dom2urls[dom] > out }

至:

dom in dom2urls { printf "%s", dom2urls[dom] > out }