如何在第 n 个分隔符处拆分字符串?

how do I split a string on the nth delimiter?

对于我文件中的每一行,我想打印该行中第 4 个破折号之前的所有内容。

输入:

TCGA-HC-8216-10A-11D-A323-01
TCGA-J4-8200-10A-11D-A323-01
TCGA-EJ-A65E-10A-11D-A323-01

我想在第四个破折号“-”上拆分每一行

输出:

TCGA-HC-8216-10A
TCGA-J4-8200-10A
TCGA-EJ-A65E-10A

我知道我可以像这样拆分每个破折号:

#!/usr/bin/env bash

IN="TCGA-HC-8216-01A-11D-A323-01
TCGA-J4-8200-10A-11D-A323-01
TCGA-EJ-A65E-10A-11D-A323-01"

arr=$(echo $IN | tr "-" "\n")

for x in $arr
do
 echo "> [$x]"
done

但这会在每个破折号之间拆分并打印字符串的每个部分。

#!/bin/bash

IN="TCGA-HC-8216-01A-11D-A323-01
TCGA-J4-8200-10A-11D-A323-01
TCGA-EJ-A65E-10A-11D-A323-01"

arr=$(echo "$IN" | cut -d '-' -f1-4)

echo "$arr"

打印:

TCGA-HC-8216-01A
TCGA-J4-8200-10A
TCGA-EJ-A65E-10A

使用cut

cut -d- -f1-4 <<'EOF'
TCGA-HC-8216-01A-11D-A323-01
TCGA-J4-8200-10A-11D-A323-01
TCGA-EJ-A65E-10A-11D-A323-01
EOF

您正在 --d(定界符)上剪切输入并返回 -f(字段)1-4,一到四。

将 grep 与 ERE 结合使用:

arr=$(echo "$IN" | grep -oE "^([^-]*-){3}[^-]*")

有 BRE:

arr=$(echo "$IN" | grep -o "^\([^-]*-\)\{3\}[^-]*")

示例:

#!/bin/bash
IN="TCGA-HC-8216-01A-11D-A323-01
TCGA-J4-8200-10A-11D-A323-01
TCGA-EJ-A65E-10A-11D-A323-01"

arr=$(echo "$IN" | grep -oE "^([^-]*-){3}[^-]*")

for x in $arr
do
 echo "> [$x]"
done

输出:

> [TCGA-HC-8216-01A]
> [TCGA-J4-8200-10A]
> [TCGA-EJ-A65E-10A]

使用纯 bash 和模式匹配:

#!/bin/bash    
IN="TCGA-HC-8216-01A-11D-A323-01
TCGA-J4-8200-10A-11D-A323-01
TCGA-EJ-A65E-10A-11D-A323-01"

re='([^-]+-){3}[^-]+'

for line in $IN
do

    if [[ $line =~ $re ]]; then
        trunc=${BASH_REMATCH[0]}
    fi
    echo "$trunc"
done

输出:

TCGA-HC-8216-01A
TCGA-J4-8200-10A
TCGA-EJ-A65E-10A