用单个文件的第 n 行替换第 n 个文件的第三行

Question

假设我在/train/xml/中有数百个*.xml，格式如下

# this is the content of /train/xml/RIGHT_NAME.xml
<annotation>
    <path>/train/img/WRONG_NAME.jpg</path>    # this is the WRONG_NAME
</annotation>

<path>...</path> 中的文件名 WRONG_NAME 应与 .xml 文件的文件名相匹配，因此它看起来像这样：

# this is the content of /train/xml/RIGHT_NAME.xml
<annotation>
    <path>/train/img/RIGHT_NAME.jpg</path>    # this is the **RIGHT_NAME**
</annotation>

我能想到的一个解决方案是：

1。将所有文件名导出到文本文件中：

ls -1 *.xml > filenames.txt

生成一个文件，内容为：

RIGHT_NAME_0.xml
RIGHT_NAME_1.xml
...

2。然后编辑 `filenames.txt`，使其变为：

# tab at beginning of each line
    <path>/train/img/RIGHT_NAME_0.jpg</path>
    <path>/train/img/RIGHT_NAME_1.jpg</path>
    ...

3。然后，将第 `n` 个 `.xml` 文件的第三行替换为 `filenames.txt`.

的第 n 行

因此问题标题。

我已经反复研究 sed 和 awk 但没有成功。我应该怎么做（在 EDIT: MacOS 机器上）？另外，有没有更优雅的解决方案？

在此先感谢您的帮助！

---我尝试过（但没有成功）的事情---

# this replaces the fifth line with an empty string
for i in *.xml ; do perl -i.bak -pe 's/.*/$i/ if $.==5' RIGHT_NAME.xml ; done

# this apprehends contents of filenames.txt after third line
sed -i.bak -e '/\<path\>/r filenames.txt' RIGHT_NAME.xml

# also, trying to utilize the <path>...</path> pattern...

Answer 1

未测试：

for xml in *.xml; do
    sed -E -i.bak '3s/([^/]*.jpg)/'"${xml/.xml/.jpg}/" "$xml"
done

Answer 2

如果 ed 是可以接受的，因为它应该默认安装在 mac。

#!/bin/sh

for file in ./*.xml; do
  printf 'Processing %s\n' "$file"
  f=${file%.*}; f=${f#*./}
  printf '%s\n' H "g/<annotation>/;/<\/annotation>/\
    s|^\([[:blank:]]*<path>.*/\)[^.]*\(.*</path>\)|${f}|" %p Q |
  ed -s "$file" || break
done

即使您有
也会得到想要的结果
/foo/bar/baz/more/train/img/WRONG_NAME.jpg
只会 edit/parse path 标签内的字符串，它位于 annotation 标签内。
如果需要就地编辑，请将 Q 更改为 w。
删除 %p 以使输出静音。

警告： ed 不是 xml editor/parser。

Answer 3

这可能适合您（GNU sed 和并行）：

parallel --dry sed -i '3s#[^/]*.jpg#{/.}.jpg#' {} ::: /train/xml/*.xml

同时，{} 表示文件名及其路径，而 {/.} 表示文件名减去路径及其扩展名。

检查上述解决方案的输出后，可以删除缩写形式 --dry-run 的选项 --dry。

Answer 4

使用 GNU awk（如果您的系统上还没有它，您可以轻松地在 MacOS 上安装它）进行“就地”编辑，gensub() 和匹配的第三个参数()：

$ cat tst.awk
match([=10=],"(^\s*<path>.*/).*([.][^.]+</path>)",a) {
    name = gensub("(.*/)?(.*)[.][^.]+$","\2",1,FILENAME)
    [=10=] = a[1] name a[2]
}
{ print }

$ head *.xml
==> RIGHT_NAME_1.xml <==
# this is the content of /train/xml/RIGHT_NAME_1.xml
<annotation>
    <path>/train/img/WRONG_NAME.xml.jpg</path>
</annotation>

==> RIGHT_NAME_2.xml <==
# this is the content of /train/xml/RIGHT_NAME_2.xml
<annotation>
    <path>/train/img/WRONG_NAME.xml.jpg</path>
</annotation>

$ awk -i inplace -f tst.awk *.xml

$ head *.xml
==> RIGHT_NAME_1.xml <==
# this is the content of /train/xml/RIGHT_NAME_1.xml
<annotation>
    <path>/train/img/RIGHT_NAME_1.jpg</path>
</annotation>

==> RIGHT_NAME_2.xml <==
# this is the content of /train/xml/RIGHT_NAME_2.xml
<annotation>
    <path>/train/img/RIGHT_NAME_2.jpg</path>
</annotation>

只需在您的系统上将其称为 awk -i inplace -f tst.awk /train/xml/*。请注意，上面只是替换了 <path> 标签内容中出现在它自己的行中的名称，因此无论它是任何给定文件中的第 3 行还是其他行，它都会起作用。如果你真的只想在第三行这样做，那么只需将 match(... 更改为 FNR==3 && match(....

用单个文件的第 n 行替换第 n 个文件的第三行

Replace third line of nth file with nth line of a single file

bash

ubuntu

awk

sed

1。将所有文件名导出到文本文件中：

2。然后编辑 `filenames.txt`，使其变为：

3。然后，将第 `n` 个 `.xml` 文件的第三行替换为 `filenames.txt`.

用单个文件的第 n 行替换第 n 个文件的第三行

Replace third line of nth file with nth line of a single file

bash

ubuntu

awk

sed

1。将所有文件名导出到文本文件中：

2。然后编辑 filenames.txt，使其变为：

3。然后，将第 n 个 .xml 文件的第三行替换为 filenames.txt.

2。然后编辑 `filenames.txt`，使其变为：

3。然后，将第 `n` 个 `.xml` 文件的第三行替换为 `filenames.txt`.