在 Unix 中嵌套拆分后查找不同的元素

Question

我有一个字符串，其中包含由 space 分隔的多个值。现在每个单独的值都有由另一个分隔符“-”分隔的字符。

我正在寻找一个很好的解决方案，使用 shell 脚本在值的第一个字段中查找唯一字符串

澄清一下，我的字符串格式如下

abc-def-ghi 123-456-789 abc-xyp-lmn 789-abc-def

现在我想在每个字符串的第一个字段中找到唯一的字符串。所以在这里，“abc”，“123”和“789”到一个数组。

Answer 1

使用 perl 并假设字符串在 bash 变量中：

perl -lane 'my %words; $words{(split(/-/, $_))[0]} = 1 for @F; print scalar(keys %words)' <<<"$thevariable"

如果您想要唯一值而不是它们的总数，print join(" ", keys %words)

Answer 2

如果您不关心顺序，这会起作用：

echo abc-def-ghi 123-456-789 abc-xyp-lmn 789-abc-def | sed --expression='s/\ /\n/g' | cut -d'-' -f1 | sort | uniq

如果您只想获取计数，请将 wc -l 附加到该计数的末尾

echo abc-def-ghi 123-456-789 abc-xyp-lmn 789-abc-def | sed --expression='s/\ /\n/g' | cut -d'-' -f1 | sort | uniq | wc -l

Answer 3

让我们长话短说：

tr ' ' $'\n' < file | awk -F- '{a[]++}END{for (i in a) {print i}}'

按要求提供数组：

arr=( $(tr ' ' $'\n' < file | awk -F- '{a[]++}END{for (i in a) {print i}}') )
printf '%s\n' "${arr[@]}"

abc
123
789

Answer 4

使用perl：

perl -lnE '
    my %seen; $, = "\n";
    say grep { !$seen{$_}++ } map { (split /-/)[0] } split / /
' file

您可以将 file 替换为 here-string :

<<< 'abc-def-ghi 123-456-789 abc-xyp-lmn 789-abc-def'

输出

abc
123
789

Answer 5

另一种方法仅使用 bash。

#!/usr/bin/env bash

## If the string is not in an array format, use the code below.
##: string='abc-def-ghi 123-456-789 abc-xyp-lmn 789-abc-def'
##: string=${string// / $'\n'}
##: mapfile -t array <<< "$string"

array=(abc-def-ghi 123-456-789 abc-xyp-lmn 789-abc-def)

declare -A uniq

for i in  "${array[@]%%-*}"; do
  ((uniq["$i"]++))
done

printf '%s\n' "${!uniq[@]}"

在 Unix 中嵌套拆分后查找不同的元素

Find distinct elements after nested splitting in Unix

linux

arrays

bash

split

unique

输出