使用sed提取两个斜杠之间的字符串

Question

我正在尝试使用 sed 从文件中的一行中提取特定字符串。目前我正在读取一个带有 while 循环的文件并搜索特定的字符串。当找到那个字符串时，我正在提取它，但是我需要使用 sed 来解析输出，这样我只能得到两个斜杠之间的字符串（它是一个目录名，所以如果可能的话我需要保留开始和尾随的斜杠).这是我正在运行搜索文件的循环：

#!/bin/sh
file=configFile.conf
while read line 
do
    if  echo "$line" | grep -q "directory_root" 
    then DIR_ROOT="$line"
fi
done < "$file"
echo $DIR_ROOT
exit 0

while 循环运行并回显以下字符串：

directory_root /root/config/data/

然后我需要使用 sed 以获得以下输出，以便将正确的目录名称传递给另一个脚本：

/root/

是否可以使用 sed 和正则表达式从回显输出中仅提取上述内容？

谢谢

Answer 1

您不一定需要 sed。你可以只使用 bash:

#!/bin/bash

f="directory_root /asdf/asdfad/fad"
regex="^directory_root (\/\w+\/).*$"
if [[ $f =~ $regex ]]
then
    name="${BASH_REMATCH[1]}"
    echo $name
fi

打印/asdf/

参见：Capturing Groups From a Grep RegEx

Answer 2

sed -rn 's|^directory_root[[:blank:]]+(/[^/]*/?).*||p' data

-n: 禁止自动打印图案 space
-r：启用扩展的正则表达式（不需要转义+等）
s|regex|replacement|: 您可以选择不同的分隔符
p：仅当 regex 匹配时才打印当前模式 space
[:blank:]：匹配<tab>或<space>
( regex )：捕获一个组，稍后可以用 </code>、<code>、...

/[^/]*/? 匹配 /，后跟任意数量的非斜线，可选地后跟另一个 /。这将正确输出 /root/。

但是，如果您碰巧有 directory_root / 或 directory_root /dir 怎么办？这就是 /? 的用途。如果你只想打印两边都被 / 包围的目录，只需删除 ?.

Answer 3

您可以使用两步变量替换将 DIR_ROOT 剪切到顶层目录：

DIR_ROOT="${DIR_ROOT#/}"    # cut away the leading slash
DIR_ROOT="/${DIR_ROOT%%/*}"  # cut the trailing path and re-add the slash

Answer 4

如果你想使用 sed，这会起作用：

~/tmp> str="directory_root /root/config/data/"
~/tmp> echo $str | sed 's|^[^/]*\(/[^/]*/\).*$||'
/root/

或者单行（假设directory_root文字在行中：）

 cat file | sed -e 's|^directory_root \(/[^/]*/\).*$||;tx;d;:x'

第一个例子中正则表达式的解释：

s| ：使用 | 作为分隔符（在这种情况下更容易阅读）

^ : 匹配行首

[^/]* ：匹配所有非 / 字符（这是贪婪的，所以它会在遇到第一个 /.

时停止

\( : 开始记录字符串 1

/ ：匹配文字 /

[^/]* ：匹配所有非 / 字符

\) : 完成字符串 1

的录制

.* ：将其他所有内容匹配到行尾

| : 分隔符

</code> : 用字符串 1</p> 替换匹配项 <p><code>| : 分隔符

在第二个示例中，我附加了 ;tx;d;:x，它不会回显不匹配 see here 的行。然后，您可以运行对整个文件执行此操作，它只会打印修改过的行。

~/tmp> echo "xx" > tmp.txt
~/tmp> echo "directory_root /root/config/data/" >> tmp.txt
~/tmp> echo "xxxx ttt" >> tmp.txt
~/tmp>
~/tmp> cat tmp.txt | sed -e 's|^directory_root \(/[^/]*/\).*$||;tx;d;:x'
/root/

Answer 5

既然你要求 sed 解决方案，我有一个给你：

$ s="directory_root /root/config/data"
$ echo "${s}" | sed -e 's/\//\x00/; s/\//\x00/; s/.*\x00\(.*\)\x00.*/\/\//;'
/root/

这是如何运作的？好吧，由于 sed 没有非贪婪匹配，诀窍是使用一系列搜索和替换来进行设置，这样您就不需要非贪婪匹配了。第一个 s/// 将第一个斜杠替换为 NUL 字节，然后再执行一次。现在你已经将前两个斜杠（仅）替换为一个字节，该字节不会出现在任何 UNIX shell 字符串的输入中，所以现在你可以提取被 \x00 包围的目录常规的贪心 sed 搜索和替换（第三个 s///）。

干杯！

注 1：此解决方案的部分灵感来自 an answer on unix stack exchange

注意 2：由于空字节，此解决方案需要 GNU sed。如果您使用的是 BSD sed (macos)，您可能只想使用其他一些不会出现在您的输入中的分隔符。

PS：不使用 sed 可能更容易。

使用sed提取两个斜杠之间的字符串

Extracting string between two slashes using sed

regex

bash

shell

text-processing

sed