带有自定义行分隔符的 Unix diff
Unix diff with custom line separator
想要比较两个 CSV 文件。假设字段分隔符是$,每条记录有两个字段,文件格式可以是这样的:
a$simple line$
b$run-on-
line$
c$simple line$
是否有一些 Unix diff 命令可以让我 运行 比较记录分隔符(行分隔符)是 $ 符号后紧跟新行的比较?
理想情况下,我想确保 diff 在检测到任何更改时输出整个记录。
使用默认行为,我可能会获得部分记录作为 diff 输出(在记录 运行 跨越多行的情况下)。
有没有我没有考虑的更聪明的方法?
--
编辑添加:预期输出示例
如果我将上面的 CSV 文件与:
a$simple line$
b$run-on-changed-
line$
c$simple line$
...我希望看到整个记录 b 报告为差异。像
2c2
< b$run-on-\nline$
---
> b$run-on-changed-\nline$
Peter,gnu diff 中不直接支持自定义行分隔符:http://man7.org/linux/man-pages/man1/diff.1.html (gnu diffutils)
您可以尝试使用 sed
两次:sed 将您的格式转换为每行一条记录以进行比较;差异转换文件; sed 回到多行记录格式。
首先sed会将所有$\n
转换为真正的\n
;和 \n
之前没有 $
到一些独特的特殊序列,比如 #%#$%#$%#$#
.
然后做差异
第二个 sed 会将 #%#$%#$%#$#
转换回 \n
(或转换为 \n
以便于查看 diff 输出)
有支持使用 csv 的 diff 变体。其中一些可能会在字段内处理带有换行符的 csv:
https://pypi.python.org/pypi/csvdiff (python)
csvdiff allows you to compare the semantic contents of two CSV files, ignoring things like row and column ordering in order to get to what’s actually changed. This is useful if you’re comparing the output of an automatic system from one day to the next, so that you can look at just what’s changed.
https://github.com/agardiner/csv-diff (ruby)
Unlike a standard diff that compares line by line, and is sensitive to the ordering of records, CSV-Diff identifies common lines by key field(s), and then compares the contents of the fields in each line.
http://csvdiff.sourceforge.net/ (perl)
csvdiff is a perl script to compare/diff two (comma) seperated files with each other. The part that is different to standard diff is, that you'll get the number of the record where the difference occours and the field/column which is different. The separator can be set to the value you want it to, not just comma. Also you can to provide a third file which contains the columnnames in one(!) line separated by your separator.
想要比较两个 CSV 文件。假设字段分隔符是$,每条记录有两个字段,文件格式可以是这样的:
a$simple line$
b$run-on-
line$
c$simple line$
是否有一些 Unix diff 命令可以让我 运行 比较记录分隔符(行分隔符)是 $ 符号后紧跟新行的比较?
理想情况下,我想确保 diff 在检测到任何更改时输出整个记录。
使用默认行为,我可能会获得部分记录作为 diff 输出(在记录 运行 跨越多行的情况下)。
有没有我没有考虑的更聪明的方法?
-- 编辑添加:预期输出示例
如果我将上面的 CSV 文件与:
a$simple line$
b$run-on-changed-
line$
c$simple line$
...我希望看到整个记录 b 报告为差异。像
2c2
< b$run-on-\nline$
---
> b$run-on-changed-\nline$
Peter,gnu diff 中不直接支持自定义行分隔符:http://man7.org/linux/man-pages/man1/diff.1.html (gnu diffutils)
您可以尝试使用 sed
两次:sed 将您的格式转换为每行一条记录以进行比较;差异转换文件; sed 回到多行记录格式。
首先sed会将所有$\n
转换为真正的\n
;和 \n
之前没有 $
到一些独特的特殊序列,比如 #%#$%#$%#$#
.
然后做差异
第二个 sed 会将 #%#$%#$%#$#
转换回 \n
(或转换为 \n
以便于查看 diff 输出)
有支持使用 csv 的 diff 变体。其中一些可能会在字段内处理带有换行符的 csv:
https://pypi.python.org/pypi/csvdiff (python)
csvdiff allows you to compare the semantic contents of two CSV files, ignoring things like row and column ordering in order to get to what’s actually changed. This is useful if you’re comparing the output of an automatic system from one day to the next, so that you can look at just what’s changed.
https://github.com/agardiner/csv-diff (ruby)
Unlike a standard diff that compares line by line, and is sensitive to the ordering of records, CSV-Diff identifies common lines by key field(s), and then compares the contents of the fields in each line.
http://csvdiff.sourceforge.net/ (perl)
csvdiff is a perl script to compare/diff two (comma) seperated files with each other. The part that is different to standard diff is, that you'll get the number of the record where the difference occours and the field/column which is different. The separator can be set to the value you want it to, not just comma. Also you can to provide a third file which contains the columnnames in one(!) line separated by your separator.