使用 bash 和 awk 将多个子域分成所有可能的子域组合

Question

我正在尝试使用 bash 将多个子域分成所有可能的子域组合。

例如，如果 subdomains.txt 有：

www.ir.example.com
www.it.api4.qa.example.com
www.api.example2.com

预期输出必须是：

example.com
ir.example.com
www.ir.example.com
qa.example.com
api4.qa.example.com
it.api4.qa.example.com
example2.com
api.example2.com
www.api.example2.com

我认为最好的办法是使用 . 在不破坏原始域的情况下分隔子域，但我不确定如何实现这一点，任何帮助都会很棒。

Answer 1

使用 awk：

awk 'BEGIN{FS=OFS="."}           # Set the input and output field separator to a dot
     {
        for(i=1;i<NF;i++) {      # Number of domains to print
          for(j=i;j<NF;j++)      # For each domain element
            d=d $j OFS;          # d is the domain
          a[d $NF]               # store it in the array a
          d=""                   # Reset the domain
        }
     }
     END{
       for(i in a)               # Loop through each element of the array a
         print i                 # and print it
     }' file

请注意，使用数组 a 是为了拥有唯一的域名（而不是两次 example.com）。

另请注意，域未排序，如果需要，您可以通过 sort 管道命令。

Answer 2

据我所知（和一些 UNIX），任何 linux 发行版都带有 Perl。所以我在这里抛出一个 perl 的替代方案：

perl -e 'while(<>){while(s/^([^.]+\.)(.+)//){$x{.}=1}}print "$_\n" foreach(keys %x)' subdomains.txt

代码，'unfolded':

while(<>){ # read file line by line. Store line at $_
  # Match first subdomain to group  and the rest to group 
  # replace by , so we will remove the first subdomain part
  while(s/^([^.]+\.)(.+)//){ 
    # Store it on a hash (that will avoid printing duplicates)
    $x{.}=1
  }
}
# print the keys of the hash
print "$_\n" foreach(keys %x)

Answer 3

这是一个使用 GNU 的解决方案 sed:

sed -nr 's/\./#/g;:a;/#/!{p;bb};s/#([^#]+)$/./;h;s/.*#//p;g;ta;:b' subdomains.txt

Answer 4

你可以试试这个 awk

awk -F'.' '{b=$NF;for(i=NF-1;i>0;i--){b=$i FS b;print b}}' infile

使用 bash 和 awk 将多个子域分成所有可能的子域组合

Separate multiple subdomains into all possible subdomain combinations using bash and awk

regex

subdomain

bash

awk

printf