如何在第 n 个分隔符处拆分字符串?
how do I split a string on the nth delimiter?
对于我文件中的每一行,我想打印该行中第 4 个破折号之前的所有内容。
输入:
TCGA-HC-8216-10A-11D-A323-01
TCGA-J4-8200-10A-11D-A323-01
TCGA-EJ-A65E-10A-11D-A323-01
我想在第四个破折号“-”上拆分每一行
输出:
TCGA-HC-8216-10A
TCGA-J4-8200-10A
TCGA-EJ-A65E-10A
我知道我可以像这样拆分每个破折号:
#!/usr/bin/env bash
IN="TCGA-HC-8216-01A-11D-A323-01
TCGA-J4-8200-10A-11D-A323-01
TCGA-EJ-A65E-10A-11D-A323-01"
arr=$(echo $IN | tr "-" "\n")
for x in $arr
do
echo "> [$x]"
done
但这会在每个破折号之间拆分并打印字符串的每个部分。
#!/bin/bash
IN="TCGA-HC-8216-01A-11D-A323-01
TCGA-J4-8200-10A-11D-A323-01
TCGA-EJ-A65E-10A-11D-A323-01"
arr=$(echo "$IN" | cut -d '-' -f1-4)
echo "$arr"
打印:
TCGA-HC-8216-01A
TCGA-J4-8200-10A
TCGA-EJ-A65E-10A
使用cut
cut -d- -f1-4 <<'EOF'
TCGA-HC-8216-01A-11D-A323-01
TCGA-J4-8200-10A-11D-A323-01
TCGA-EJ-A65E-10A-11D-A323-01
EOF
您正在 -
的 -d
(定界符)上剪切输入并返回 -f
(字段)1-4
,一到四。
将 grep 与 ERE 结合使用:
arr=$(echo "$IN" | grep -oE "^([^-]*-){3}[^-]*")
有 BRE:
arr=$(echo "$IN" | grep -o "^\([^-]*-\)\{3\}[^-]*")
示例:
#!/bin/bash
IN="TCGA-HC-8216-01A-11D-A323-01
TCGA-J4-8200-10A-11D-A323-01
TCGA-EJ-A65E-10A-11D-A323-01"
arr=$(echo "$IN" | grep -oE "^([^-]*-){3}[^-]*")
for x in $arr
do
echo "> [$x]"
done
输出:
> [TCGA-HC-8216-01A]
> [TCGA-J4-8200-10A]
> [TCGA-EJ-A65E-10A]
使用纯 bash 和模式匹配:
#!/bin/bash
IN="TCGA-HC-8216-01A-11D-A323-01
TCGA-J4-8200-10A-11D-A323-01
TCGA-EJ-A65E-10A-11D-A323-01"
re='([^-]+-){3}[^-]+'
for line in $IN
do
if [[ $line =~ $re ]]; then
trunc=${BASH_REMATCH[0]}
fi
echo "$trunc"
done
输出:
TCGA-HC-8216-01A
TCGA-J4-8200-10A
TCGA-EJ-A65E-10A
对于我文件中的每一行,我想打印该行中第 4 个破折号之前的所有内容。
输入:
TCGA-HC-8216-10A-11D-A323-01
TCGA-J4-8200-10A-11D-A323-01
TCGA-EJ-A65E-10A-11D-A323-01
我想在第四个破折号“-”上拆分每一行
输出:
TCGA-HC-8216-10A
TCGA-J4-8200-10A
TCGA-EJ-A65E-10A
我知道我可以像这样拆分每个破折号:
#!/usr/bin/env bash
IN="TCGA-HC-8216-01A-11D-A323-01
TCGA-J4-8200-10A-11D-A323-01
TCGA-EJ-A65E-10A-11D-A323-01"
arr=$(echo $IN | tr "-" "\n")
for x in $arr
do
echo "> [$x]"
done
但这会在每个破折号之间拆分并打印字符串的每个部分。
#!/bin/bash
IN="TCGA-HC-8216-01A-11D-A323-01
TCGA-J4-8200-10A-11D-A323-01
TCGA-EJ-A65E-10A-11D-A323-01"
arr=$(echo "$IN" | cut -d '-' -f1-4)
echo "$arr"
打印:
TCGA-HC-8216-01A
TCGA-J4-8200-10A
TCGA-EJ-A65E-10A
使用cut
cut -d- -f1-4 <<'EOF'
TCGA-HC-8216-01A-11D-A323-01
TCGA-J4-8200-10A-11D-A323-01
TCGA-EJ-A65E-10A-11D-A323-01
EOF
您正在 -
的 -d
(定界符)上剪切输入并返回 -f
(字段)1-4
,一到四。
将 grep 与 ERE 结合使用:
arr=$(echo "$IN" | grep -oE "^([^-]*-){3}[^-]*")
有 BRE:
arr=$(echo "$IN" | grep -o "^\([^-]*-\)\{3\}[^-]*")
示例:
#!/bin/bash
IN="TCGA-HC-8216-01A-11D-A323-01
TCGA-J4-8200-10A-11D-A323-01
TCGA-EJ-A65E-10A-11D-A323-01"
arr=$(echo "$IN" | grep -oE "^([^-]*-){3}[^-]*")
for x in $arr
do
echo "> [$x]"
done
输出:
> [TCGA-HC-8216-01A]
> [TCGA-J4-8200-10A]
> [TCGA-EJ-A65E-10A]
使用纯 bash 和模式匹配:
#!/bin/bash
IN="TCGA-HC-8216-01A-11D-A323-01
TCGA-J4-8200-10A-11D-A323-01
TCGA-EJ-A65E-10A-11D-A323-01"
re='([^-]+-){3}[^-]+'
for line in $IN
do
if [[ $line =~ $re ]]; then
trunc=${BASH_REMATCH[0]}
fi
echo "$trunc"
done
输出:
TCGA-HC-8216-01A
TCGA-J4-8200-10A
TCGA-EJ-A65E-10A