使用 Text.Generic.Diff 生成包含 "inserted" / "deleted" 标记的输出

Using Text.Generic.Diff to generate output containing "inserted" / "deleted" markers

我正在使用 Haskell gdiff 包来计算树之间的差异。 diff 算法的输出是 "edit script",它描述了将 "before" 树转换为 "after" 树的一系列操作。 gdiff 提供了一个 "patch" 函数,它将编辑脚本应用于 "before" 树,从而生成 "after" 树。

我需要做的是修改这个补丁操作,以便输出是一个 "after" 树,其中突出显示了修改。

举个例子,假设这棵树是一个文档 AST。我想生成一个显示插入/删除的输出,内嵌在 "after" 文档中。

到目前为止,我已经编写了一个程序,它成功地使用 gdiff 来计算简单二叉树数据结构实例之间的差异。我想不通的是如何修改生成的编辑脚本,以便它在执行补丁操作时注入 "inserted" 和 "deleted" 标记。

有人可以帮忙吗?

区分两个二叉树

这是我的二叉树数据结构:

data Tree = Node String Tree Tree
          | Empty
          deriving Show

这是我的示例 "before" 和 "after" 树:

before :: Tree
before =
  Node "root"
    (Node "A"
      (Empty)
      (Empty)
    )
    (Empty)

after :: Tree
after =
  Node "root"
    (Node "A"
      (Node "B" Empty Empty)
      (Empty)
    )
    (Empty)

diff执行如下:

runDiff :: EditScript TreeFamily Tree Tree
runDiff = diff before after

main :: IO ()
main = do
  putStrLn ("before     = " ++ (show before))
  putStrLn ("after      = " ++ (show after))

  let edit = runDiff
  putStrLn ("edit       = " ++ (show edit))

  let compressed = compress edit
  putStrLn ("compressed = " ++ (show compressed))

  let result = patch edit before
  putStrLn ("result     = " ++ (show result))

(稍后我会回到 TreeFamily 的定义。)

输出为:

before     = Node "root" (Node "A" Empty Empty) Empty
after      = Node "root" (Node "A" (Node "B" Empty Empty) Empty) Empty
edit       = Cpy Node $ Cpy "root" $ Cpy Node $ Cpy "A" $ Ins Node $ Ins "B" $ Cpy Empty $ Cpy Empty $ Cpy Empty $ Ins Empty $ End
compressed = Cpy Node $ CpyTree $ Cpy Node $ CpyTree $ Ins Node $ Ins "B" $ CpyTree $ CpyTree $ CpyTree $ Ins Empty $ End
result     = Node "root" (Node "A" (Node "B" Empty Empty) Empty) Empty

建议策略:处理编辑脚本

我认为我可以通过处理编辑脚本来实现"generate marked-up after tree"操作,从而将... $ Ins Node $ ...替换为... $ Ins InsNode $ ...,其中InsNode是另一个[=26] =]构造函数:

data Tree = Node String Tree Tree
          | InsNode String Tree Tree
          | Empty
          deriving Show

(对于删除也类似,但是这个 post 只关注插入。)

处理后的编辑脚本将被提供给现有的 gdiff 补丁函数。

树族定义

gdiff 库需要用户定义 "family datatype"。这是我的定义。请注意,我已经包含了 InsNode 类型。虽然这没有出现在输入数据中,但我 认为 gdiff 需要了解它才能执行上述 NodeInsNode 的替换。

data TreeFamily :: * -> * -> * where
    Node'       ::           TreeFamily Tree (Cons String (Cons Tree (Cons Tree Nil)))
    InsNode'    ::           TreeFamily Tree (Cons String (Cons Tree (Cons Tree Nil)))
    String'     :: String -> TreeFamily String Nil
    Empty'      ::           TreeFamily Tree Nil


instance Family TreeFamily where
    decEq Node' Node'                  = Just(Refl, Refl)
    decEq InsNode' InsNode'            = Just(Refl, Refl)
    decEq (String' s1) (String' s2)
                | s1 == s2             = Just (Refl, Refl)
                | otherwise            = Nothing
    decEq Empty' Empty'                = Just(Refl, Refl)
    decEq _ _                          = Nothing

    fields Node' (Node s t1 t2)        = Just (CCons s (CCons t1 (CCons t2 CNil)))
    fields InsNode' (InsNode s t1 t2)  = Just (CCons s (CCons t1 (CCons t2 CNil)))
    fields (String' _) _               = Just CNil
    fields Empty' Empty                = Just CNil
    fields _ _                         = Nothing

    apply Node' (CCons s (CCons t1 (CCons t2 CNil)))    = Node s t1 t2
    apply InsNode' (CCons s (CCons t1 (CCons t2 CNil))) = InsNode s t1 t2
    apply (String' s) CNil                              = s
    apply Empty' CNil                                   = Empty

    string Node'       = "Node"
    string InsNode'    = "InsNode"
    string (String' s) = show s
    string Empty'      = "Empty"


instance Type TreeFamily Tree where
    constructors = [ Concr Node', Concr InsNode', Concr Empty' ]

instance Type TreeFamily String where
    constructors = [ Abstr String' ]

第一次尝试 processEdit 函数

处理 EditScript 以执行 NodeInsNode 替换的函数应具有与 compress 函数相同的签名,即:

processEdit :: (Family f) => EditScriptL f txs tys -> EditScriptL f txs tys

我可以写出以下恒等式...

processEdit End         = End
processEdit (Ins  c  d) = Ins  c   (processEdit d)
processEdit (Del  c  d) = Del  c   (processEdit d)
processEdit (CpyTree d) = CpyTree  (processEdit d)
processEdit (Cpy  c  d) = Cpy  c   (processEdit d)

...但我不知道如何修改 Ins 方程来执行替换。有人可以帮忙吗?

完整的测试程序供参考

{-# LANGUAGE FlexibleInstances     #-}
{-# LANGUAGE GADTs                 #-}
{-# LANGUAGE KindSignatures        #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE NoImplicitPrelude     #-}


module Main where

import Prelude
import Data.Generic.Diff


-- Data types --

data Tree = Node String Tree Tree
          | InsNode String Tree Tree
          | Empty
          deriving Show


-- GADT Family --

data TreeFamily :: * -> * -> * where
    Node'       ::           TreeFamily Tree (Cons String (Cons Tree (Cons Tree Nil)))
    InsNode'    ::           TreeFamily Tree (Cons String (Cons Tree (Cons Tree Nil)))
    String'     :: String -> TreeFamily String Nil
    Empty'      ::           TreeFamily Tree Nil


instance Family TreeFamily where
    decEq Node' Node'                  = Just(Refl, Refl)
    decEq InsNode' InsNode'            = Just(Refl, Refl)
    decEq (String' s1) (String' s2)
                | s1 == s2             = Just (Refl, Refl)
                | otherwise            = Nothing
    decEq Empty' Empty'                = Just(Refl, Refl)
    decEq _ _                          = Nothing

    fields Node' (Node s t1 t2)        = Just (CCons s (CCons t1 (CCons t2 CNil)))
    fields InsNode' (InsNode s t1 t2)  = Just (CCons s (CCons t1 (CCons t2 CNil)))
    fields (String' _) _               = Just CNil
    fields Empty' Empty                = Just CNil
    fields _ _                         = Nothing

    apply Node' (CCons s (CCons t1 (CCons t2 CNil)))    = Node s t1 t2
    apply InsNode' (CCons s (CCons t1 (CCons t2 CNil))) = InsNode s t1 t2
    apply (String' s) CNil                              = s
    apply Empty' CNil                                   = Empty

    string Node'       = "Node"
    string InsNode'    = "InsNode"
    string (String' s) = show s
    string Empty'      = "Empty"


instance Type TreeFamily Tree where
    constructors = [ Concr Node', Concr InsNode', Concr Empty' ]

instance Type TreeFamily String where
    constructors = [ Abstr String' ]


-- Input trees --

before :: Tree
before =
  Node "root"
    (Node "A"
      (Empty)
      (Empty)
    )
    (Empty)

after :: Tree
after =
  Node "root"
    (Node "A"
      (Node "B" Empty Empty)
      (Empty)
    )
    (Empty)


{-
Function for modifying the edit script

The objective is to transform edit script fragments of the form
    ... $ Ins Node $ ...
to
    ... $ Ins InsNode $ ...
-}

processEdit :: (Family f) => EditScriptL f txs tys -> EditScriptL f txs tys
processEdit End         = End
processEdit (Ins  c  d) = Ins  c   (processEdit d)
processEdit (Del  c  d) = Del  c   (processEdit d)
processEdit (CpyTree d) = CpyTree  (processEdit d)
processEdit (Cpy  c  d) = Cpy  c   (processEdit d)


-- Test --

-- For some reason, this signature is required for type inference to work --
runDiff :: EditScript TreeFamily Tree Tree
runDiff = diff before after

main :: IO ()
main = do
  putStrLn ("before     = " ++ (show before))
  putStrLn ("after      = " ++ (show after))

  let edit = runDiff
  putStrLn ("edit       = " ++ (show edit))

  let compressed = compress edit
  putStrLn ("compressed = " ++ (show compressed))

  let processed = processEdit compressed
  putStrLn ("processed  = " ++ (show processed))

  let result = patch edit before
  putStrLn ("result     = " ++ (show result))

只需将 processEdit 专门化为 TreeFamily(因为显然您要完成的工作特定于 TreeFamily)并在 processEdit 的(第一个)参数上进行模式匹配=14=].

processEdit :: EditScriptL TreeFamily txs tys -> EditScriptL TreeFamily txs tys
processEdit End         = End
processEdit (Ins  Node' d) = Ins  InsNode'   (processEdit d)
processEdit (Ins  c  d) = Ins  c (processEdit d)
processEdit (Del  c  d) = Del  c   (processEdit d)
processEdit (CpyTree d) = CpyTree  (processEdit d)
processEdit (Cpy  c  d) = Cpy  c   (processEdit d)

不过,我不喜欢这种做法。它需要修改您的原始数据类型,并且您失去了 "original" 树和 "patched" 树之间的类型级区别。更好的解决方案是创建另一种数据类型(例如 ChangedTree)并重新实现 patch' :: EditScriptL TreeFamily Tree Tree -> Tree -> ChangedTree。如果您同时跟踪插入和删除,是否还需要进行 "replace" 类型的更改?

哦,runDiff 需要类型签名,否则它不知道要使用什么 Type _ Tree 实例。例如。 diff @TreeFamily before after(TypeApplications 扩展名)会修复它。 Haskell 的类型类是开放的,所以它不会自动推断你想要 instance Type TreeFamily Tree 而不是其他 instance Type XXX Tree,只是因为它看不到任何其他合适的 XXX 现在在范围内并不意味着它会猜测这就是您打算使用的内容。