在 git(和 diff)的上下文中,什么是 "hunk"

In the context of git (and diff), what is a "hunk"

我在阅读一些 git 文档时正在寻找 "hunk" 的定义。

我知道这意味着描述两个文件之间的差异并且它具有明确定义的格式,但我想不起一个简洁的定义。

我尝试使用 google 进行搜索,但出现了很多虚假的结果。

最终我发现了这个:

When comparing two files, diff finds sequences of lines common to both files, interspersed with groups of differing lines called hunks.

此处: http://www.gnu.org/software/diffutils/manual/html_node/Hunks.html

这正是我正在寻找的那种简洁的定义。希望这可以帮助其他人!

供参考,您还可以阅读以下简单说明: https://mvtechjourney.wordpress.com/2014/08/01/git-stage-hunk-and-discard-hunk-sourcetree/

术语"hunk"确实不是Git所特有的,而是来自Gnu diffutil format。更简洁:

Each hunk shows one area where the files differ.

但是 Git 的挑战是为大块头确定正确的 边界

答案的其余部分有助于说明帅哥在 Git:

中的样子

经过各种试探(如 , which is gone in Git 2.12), Git maintainers settled on the indent one, which was introduced in Sept. 2016 with Git 2.11, commit 433860f.

Some groups of added/deleted lines in diffs can be slid up or down, because lines at the edges of the group are not unique.
Picking good shifts for such groups is not a matter of correctness but definitely has a big effect on aesthetics.
For example, consider the following two diffs.
The first is what standard Git emits:

--- a/9c572b21dd090a1e5c5bb397053bf8043ffe7fb4:git-send-email.perl
+++ b/6dcfa306f2b67b733a7eb2d7ded1bc9987809edb:git-send-email.perl
@@ -231,6 +231,9 @@ if (!defined $initial_reply_to && $prompting) {
 }

 if (!$smtp_server) {
+       $smtp_server = $repo->config('sendemail.smtpserver');
+}
+if (!$smtp_server) {
        foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
                if (-x $_) {
                        $smtp_server = $_;

The following diff is equivalent, but is obviously preferable from an aesthetic point of view:

--- a/9c572b21dd090a1e5c5bb397053bf8043ffe7fb4:git-send-email.perl
+++ b/6dcfa306f2b67b733a7eb2d7ded1bc9987809edb:git-send-email.perl
@@ -230,6 +230,9 @@ if (!defined $initial_reply_to && $prompting) {
        $initial_reply_to =~ s/(^\s+|\s+$)//g;
 }

+if (!$smtp_server) {
+       $smtp_server = $repo->config('sendemail.smtpserver');
+}
 if (!$smtp_server) {
        foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
                if (-x $_) {

This patch teaches Git to pick better positions for such "diff sliders" using heuristics that take the positions of nearby blank lines and the indentation of nearby lines into account.


对于 Git 2.14(2017 年第 3 季度),缩进启发式将成为默认设置!

参见 commit 1fa8a66 (08 May 2017) by Jeff King (peff)
参见 commit 33de716 (08 May 2017) by Stefan Beller (stefanbeller)
请参阅 Marc Branchaud 的 commit 37590ce, commit cf5e772(2017 年 5 月 8 日)。
(由 Junio C Hamano -- gitster -- in commit 53083f8 合并,2017 年 6 月 5 日)

diff: enable indent heuristic by default

The feature was included in v2.11 (released 2016-11-29) and we got no negative feedback. Quite the opposite, all feedback we got was positive.

Turn it on by default. Users who dislike the feature can turn it off by setting diff.indentHeuristic.


随着 Git 2.24(2019 年第 4 季度),决定拆分位置的 "indent heuristics" diff hunks 的文档已更正。

commit 64e5e1f (15 Aug 2019) by SZEDER Gábor (szeder)
(由 Junio C Hamano -- gitster -- in commit e115170 合并,2019 年 9 月 9 日)

diff: 'diff.indentHeuristic' is no longer experimental

The indent heuristic started out as experimental, but it's now our default diff heuristic since 33de716 (diff: enable indent heuristic by default, 2017-05-08, Git v2.14.0-rc0).
Alas, that commit didn't update the documentation, and the description of the 'diff.indentHeuristic' configuration variable still implies that it's experimental and not the default.

Update the description of 'diff.indentHeuristic' to make it clear that it's the default diff heuristic.

The description of the related '--indent-heuristic' option has already been updated .

The documentation 现在将改为:

diff.indentHeuristic:

Set this option to false to disable the default heuristics that shift diff hunk boundaries to make patches easier to read.


使用 Git 2.25(2020 年第一季度),您甚至不必再指定 --indent-heuristic(因为它现在已经有一段时间是默认设置了)。

参见 commit 44ae131 (28 Oct 2019) by SZEDER Gábor (szeder)
(由 Junio C Hamano -- gitster -- in commit 532d983 合并,2019 年 12 月 1 日)

builtin/blame.c: remove '--indent-heuristic' from usage string

Signed-off-by: SZEDER Gábor

The indent heuristic is our default diff heuristic since 33de716387 ("diff: enable indent heuristic by default", 2017-05-08, Git v2.14.0-rc0 -- merge listed in batch #7), but the usage string of 'git blame' still mentions it as "experimental heuristic".

We could simply update the short help associated with the option, but according to the comment above the option's declaration it was "only included here to get included in the "-h" output".

That made sense while the feature was still experimental and we wanted to give it more exposure, but nowadays it's unnecessary.

So let's rather remove the '--indent-heuristic' option from 'git blame's usage string.

Note that 'git blame' will still accept this option, as it is parsed in parse_revision_opt().

Astute readers may notice that this patch removes a comment mentioning "the following two options", but it only removes one option.

The reason is that the comment is outdated: that other options was '--compaction-heuristic', and it has already been removed in 3cde4e02ee (diff: retire "compaction" heuristics, 2016-12-23), but that commit forgot to update this comment.