使用 Text.Generic.Diff 生成包含 "inserted" / "deleted" 标记的输出
Using Text.Generic.Diff to generate output containing "inserted" / "deleted" markers
我正在使用 Haskell gdiff 包来计算树之间的差异。 diff 算法的输出是 "edit script",它描述了将 "before" 树转换为 "after" 树的一系列操作。 gdiff 提供了一个 "patch" 函数,它将编辑脚本应用于 "before" 树,从而生成 "after" 树。
我需要做的是修改这个补丁操作,以便输出是一个 "after" 树,其中突出显示了修改。
举个例子,假设这棵树是一个文档 AST。我想生成一个显示插入/删除的输出,内嵌在 "after" 文档中。
到目前为止,我已经编写了一个程序,它成功地使用 gdiff 来计算简单二叉树数据结构实例之间的差异。我想不通的是如何修改生成的编辑脚本,以便它在执行补丁操作时注入 "inserted" 和 "deleted" 标记。
有人可以帮忙吗?
区分两个二叉树
这是我的二叉树数据结构:
data Tree = Node String Tree Tree
| Empty
deriving Show
这是我的示例 "before" 和 "after" 树:
before :: Tree
before =
Node "root"
(Node "A"
(Empty)
(Empty)
)
(Empty)
after :: Tree
after =
Node "root"
(Node "A"
(Node "B" Empty Empty)
(Empty)
)
(Empty)
diff执行如下:
runDiff :: EditScript TreeFamily Tree Tree
runDiff = diff before after
main :: IO ()
main = do
putStrLn ("before = " ++ (show before))
putStrLn ("after = " ++ (show after))
let edit = runDiff
putStrLn ("edit = " ++ (show edit))
let compressed = compress edit
putStrLn ("compressed = " ++ (show compressed))
let result = patch edit before
putStrLn ("result = " ++ (show result))
(稍后我会回到 TreeFamily
的定义。)
输出为:
before = Node "root" (Node "A" Empty Empty) Empty
after = Node "root" (Node "A" (Node "B" Empty Empty) Empty) Empty
edit = Cpy Node $ Cpy "root" $ Cpy Node $ Cpy "A" $ Ins Node $ Ins "B" $ Cpy Empty $ Cpy Empty $ Cpy Empty $ Ins Empty $ End
compressed = Cpy Node $ CpyTree $ Cpy Node $ CpyTree $ Ins Node $ Ins "B" $ CpyTree $ CpyTree $ CpyTree $ Ins Empty $ End
result = Node "root" (Node "A" (Node "B" Empty Empty) Empty) Empty
建议策略:处理编辑脚本
我认为我可以通过处理编辑脚本来实现"generate marked-up after tree"操作,从而将... $ Ins Node $ ...
替换为... $ Ins InsNode $ ...
,其中InsNode
是另一个[=26] =]构造函数:
data Tree = Node String Tree Tree
| InsNode String Tree Tree
| Empty
deriving Show
(对于删除也类似,但是这个 post 只关注插入。)
处理后的编辑脚本将被提供给现有的 gdiff 补丁函数。
树族定义
gdiff 库需要用户定义 "family datatype"。这是我的定义。请注意,我已经包含了 InsNode
类型。虽然这没有出现在输入数据中,但我 认为 gdiff 需要了解它才能执行上述 Node
到 InsNode
的替换。
data TreeFamily :: * -> * -> * where
Node' :: TreeFamily Tree (Cons String (Cons Tree (Cons Tree Nil)))
InsNode' :: TreeFamily Tree (Cons String (Cons Tree (Cons Tree Nil)))
String' :: String -> TreeFamily String Nil
Empty' :: TreeFamily Tree Nil
instance Family TreeFamily where
decEq Node' Node' = Just(Refl, Refl)
decEq InsNode' InsNode' = Just(Refl, Refl)
decEq (String' s1) (String' s2)
| s1 == s2 = Just (Refl, Refl)
| otherwise = Nothing
decEq Empty' Empty' = Just(Refl, Refl)
decEq _ _ = Nothing
fields Node' (Node s t1 t2) = Just (CCons s (CCons t1 (CCons t2 CNil)))
fields InsNode' (InsNode s t1 t2) = Just (CCons s (CCons t1 (CCons t2 CNil)))
fields (String' _) _ = Just CNil
fields Empty' Empty = Just CNil
fields _ _ = Nothing
apply Node' (CCons s (CCons t1 (CCons t2 CNil))) = Node s t1 t2
apply InsNode' (CCons s (CCons t1 (CCons t2 CNil))) = InsNode s t1 t2
apply (String' s) CNil = s
apply Empty' CNil = Empty
string Node' = "Node"
string InsNode' = "InsNode"
string (String' s) = show s
string Empty' = "Empty"
instance Type TreeFamily Tree where
constructors = [ Concr Node', Concr InsNode', Concr Empty' ]
instance Type TreeFamily String where
constructors = [ Abstr String' ]
第一次尝试 processEdit 函数
处理 EditScript
以执行 Node
到 InsNode
替换的函数应具有与 compress
函数相同的签名,即:
processEdit :: (Family f) => EditScriptL f txs tys -> EditScriptL f txs tys
我可以写出以下恒等式...
processEdit End = End
processEdit (Ins c d) = Ins c (processEdit d)
processEdit (Del c d) = Del c (processEdit d)
processEdit (CpyTree d) = CpyTree (processEdit d)
processEdit (Cpy c d) = Cpy c (processEdit d)
...但我不知道如何修改 Ins
方程来执行替换。有人可以帮忙吗?
完整的测试程序供参考
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE NoImplicitPrelude #-}
module Main where
import Prelude
import Data.Generic.Diff
-- Data types --
data Tree = Node String Tree Tree
| InsNode String Tree Tree
| Empty
deriving Show
-- GADT Family --
data TreeFamily :: * -> * -> * where
Node' :: TreeFamily Tree (Cons String (Cons Tree (Cons Tree Nil)))
InsNode' :: TreeFamily Tree (Cons String (Cons Tree (Cons Tree Nil)))
String' :: String -> TreeFamily String Nil
Empty' :: TreeFamily Tree Nil
instance Family TreeFamily where
decEq Node' Node' = Just(Refl, Refl)
decEq InsNode' InsNode' = Just(Refl, Refl)
decEq (String' s1) (String' s2)
| s1 == s2 = Just (Refl, Refl)
| otherwise = Nothing
decEq Empty' Empty' = Just(Refl, Refl)
decEq _ _ = Nothing
fields Node' (Node s t1 t2) = Just (CCons s (CCons t1 (CCons t2 CNil)))
fields InsNode' (InsNode s t1 t2) = Just (CCons s (CCons t1 (CCons t2 CNil)))
fields (String' _) _ = Just CNil
fields Empty' Empty = Just CNil
fields _ _ = Nothing
apply Node' (CCons s (CCons t1 (CCons t2 CNil))) = Node s t1 t2
apply InsNode' (CCons s (CCons t1 (CCons t2 CNil))) = InsNode s t1 t2
apply (String' s) CNil = s
apply Empty' CNil = Empty
string Node' = "Node"
string InsNode' = "InsNode"
string (String' s) = show s
string Empty' = "Empty"
instance Type TreeFamily Tree where
constructors = [ Concr Node', Concr InsNode', Concr Empty' ]
instance Type TreeFamily String where
constructors = [ Abstr String' ]
-- Input trees --
before :: Tree
before =
Node "root"
(Node "A"
(Empty)
(Empty)
)
(Empty)
after :: Tree
after =
Node "root"
(Node "A"
(Node "B" Empty Empty)
(Empty)
)
(Empty)
{-
Function for modifying the edit script
The objective is to transform edit script fragments of the form
... $ Ins Node $ ...
to
... $ Ins InsNode $ ...
-}
processEdit :: (Family f) => EditScriptL f txs tys -> EditScriptL f txs tys
processEdit End = End
processEdit (Ins c d) = Ins c (processEdit d)
processEdit (Del c d) = Del c (processEdit d)
processEdit (CpyTree d) = CpyTree (processEdit d)
processEdit (Cpy c d) = Cpy c (processEdit d)
-- Test --
-- For some reason, this signature is required for type inference to work --
runDiff :: EditScript TreeFamily Tree Tree
runDiff = diff before after
main :: IO ()
main = do
putStrLn ("before = " ++ (show before))
putStrLn ("after = " ++ (show after))
let edit = runDiff
putStrLn ("edit = " ++ (show edit))
let compressed = compress edit
putStrLn ("compressed = " ++ (show compressed))
let processed = processEdit compressed
putStrLn ("processed = " ++ (show processed))
let result = patch edit before
putStrLn ("result = " ++ (show result))
只需将 processEdit
专门化为 TreeFamily
(因为显然您要完成的工作特定于 TreeFamily
)并在 processEdit
的(第一个)参数上进行模式匹配=14=].
processEdit :: EditScriptL TreeFamily txs tys -> EditScriptL TreeFamily txs tys
processEdit End = End
processEdit (Ins Node' d) = Ins InsNode' (processEdit d)
processEdit (Ins c d) = Ins c (processEdit d)
processEdit (Del c d) = Del c (processEdit d)
processEdit (CpyTree d) = CpyTree (processEdit d)
processEdit (Cpy c d) = Cpy c (processEdit d)
不过,我不喜欢这种做法。它需要修改您的原始数据类型,并且您失去了 "original" 树和 "patched" 树之间的类型级区别。更好的解决方案是创建另一种数据类型(例如 ChangedTree
)并重新实现 patch' :: EditScriptL TreeFamily Tree Tree -> Tree -> ChangedTree
。如果您同时跟踪插入和删除,是否还需要进行 "replace" 类型的更改?
哦,runDiff
需要类型签名,否则它不知道要使用什么 Type _ Tree
实例。例如。 diff @TreeFamily before after
(TypeApplications 扩展名)会修复它。 Haskell 的类型类是开放的,所以它不会自动推断你想要 instance Type TreeFamily Tree
而不是其他 instance Type XXX Tree
,只是因为它看不到任何其他合适的 XXX
现在在范围内并不意味着它会猜测这就是您打算使用的内容。
我正在使用 Haskell gdiff 包来计算树之间的差异。 diff 算法的输出是 "edit script",它描述了将 "before" 树转换为 "after" 树的一系列操作。 gdiff 提供了一个 "patch" 函数,它将编辑脚本应用于 "before" 树,从而生成 "after" 树。
我需要做的是修改这个补丁操作,以便输出是一个 "after" 树,其中突出显示了修改。
举个例子,假设这棵树是一个文档 AST。我想生成一个显示插入/删除的输出,内嵌在 "after" 文档中。
到目前为止,我已经编写了一个程序,它成功地使用 gdiff 来计算简单二叉树数据结构实例之间的差异。我想不通的是如何修改生成的编辑脚本,以便它在执行补丁操作时注入 "inserted" 和 "deleted" 标记。
有人可以帮忙吗?
区分两个二叉树
这是我的二叉树数据结构:
data Tree = Node String Tree Tree
| Empty
deriving Show
这是我的示例 "before" 和 "after" 树:
before :: Tree
before =
Node "root"
(Node "A"
(Empty)
(Empty)
)
(Empty)
after :: Tree
after =
Node "root"
(Node "A"
(Node "B" Empty Empty)
(Empty)
)
(Empty)
diff执行如下:
runDiff :: EditScript TreeFamily Tree Tree
runDiff = diff before after
main :: IO ()
main = do
putStrLn ("before = " ++ (show before))
putStrLn ("after = " ++ (show after))
let edit = runDiff
putStrLn ("edit = " ++ (show edit))
let compressed = compress edit
putStrLn ("compressed = " ++ (show compressed))
let result = patch edit before
putStrLn ("result = " ++ (show result))
(稍后我会回到 TreeFamily
的定义。)
输出为:
before = Node "root" (Node "A" Empty Empty) Empty
after = Node "root" (Node "A" (Node "B" Empty Empty) Empty) Empty
edit = Cpy Node $ Cpy "root" $ Cpy Node $ Cpy "A" $ Ins Node $ Ins "B" $ Cpy Empty $ Cpy Empty $ Cpy Empty $ Ins Empty $ End
compressed = Cpy Node $ CpyTree $ Cpy Node $ CpyTree $ Ins Node $ Ins "B" $ CpyTree $ CpyTree $ CpyTree $ Ins Empty $ End
result = Node "root" (Node "A" (Node "B" Empty Empty) Empty) Empty
建议策略:处理编辑脚本
我认为我可以通过处理编辑脚本来实现"generate marked-up after tree"操作,从而将... $ Ins Node $ ...
替换为... $ Ins InsNode $ ...
,其中InsNode
是另一个[=26] =]构造函数:
data Tree = Node String Tree Tree
| InsNode String Tree Tree
| Empty
deriving Show
(对于删除也类似,但是这个 post 只关注插入。)
处理后的编辑脚本将被提供给现有的 gdiff 补丁函数。
树族定义
gdiff 库需要用户定义 "family datatype"。这是我的定义。请注意,我已经包含了 InsNode
类型。虽然这没有出现在输入数据中,但我 认为 gdiff 需要了解它才能执行上述 Node
到 InsNode
的替换。
data TreeFamily :: * -> * -> * where
Node' :: TreeFamily Tree (Cons String (Cons Tree (Cons Tree Nil)))
InsNode' :: TreeFamily Tree (Cons String (Cons Tree (Cons Tree Nil)))
String' :: String -> TreeFamily String Nil
Empty' :: TreeFamily Tree Nil
instance Family TreeFamily where
decEq Node' Node' = Just(Refl, Refl)
decEq InsNode' InsNode' = Just(Refl, Refl)
decEq (String' s1) (String' s2)
| s1 == s2 = Just (Refl, Refl)
| otherwise = Nothing
decEq Empty' Empty' = Just(Refl, Refl)
decEq _ _ = Nothing
fields Node' (Node s t1 t2) = Just (CCons s (CCons t1 (CCons t2 CNil)))
fields InsNode' (InsNode s t1 t2) = Just (CCons s (CCons t1 (CCons t2 CNil)))
fields (String' _) _ = Just CNil
fields Empty' Empty = Just CNil
fields _ _ = Nothing
apply Node' (CCons s (CCons t1 (CCons t2 CNil))) = Node s t1 t2
apply InsNode' (CCons s (CCons t1 (CCons t2 CNil))) = InsNode s t1 t2
apply (String' s) CNil = s
apply Empty' CNil = Empty
string Node' = "Node"
string InsNode' = "InsNode"
string (String' s) = show s
string Empty' = "Empty"
instance Type TreeFamily Tree where
constructors = [ Concr Node', Concr InsNode', Concr Empty' ]
instance Type TreeFamily String where
constructors = [ Abstr String' ]
第一次尝试 processEdit 函数
处理 EditScript
以执行 Node
到 InsNode
替换的函数应具有与 compress
函数相同的签名,即:
processEdit :: (Family f) => EditScriptL f txs tys -> EditScriptL f txs tys
我可以写出以下恒等式...
processEdit End = End
processEdit (Ins c d) = Ins c (processEdit d)
processEdit (Del c d) = Del c (processEdit d)
processEdit (CpyTree d) = CpyTree (processEdit d)
processEdit (Cpy c d) = Cpy c (processEdit d)
...但我不知道如何修改 Ins
方程来执行替换。有人可以帮忙吗?
完整的测试程序供参考
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE NoImplicitPrelude #-}
module Main where
import Prelude
import Data.Generic.Diff
-- Data types --
data Tree = Node String Tree Tree
| InsNode String Tree Tree
| Empty
deriving Show
-- GADT Family --
data TreeFamily :: * -> * -> * where
Node' :: TreeFamily Tree (Cons String (Cons Tree (Cons Tree Nil)))
InsNode' :: TreeFamily Tree (Cons String (Cons Tree (Cons Tree Nil)))
String' :: String -> TreeFamily String Nil
Empty' :: TreeFamily Tree Nil
instance Family TreeFamily where
decEq Node' Node' = Just(Refl, Refl)
decEq InsNode' InsNode' = Just(Refl, Refl)
decEq (String' s1) (String' s2)
| s1 == s2 = Just (Refl, Refl)
| otherwise = Nothing
decEq Empty' Empty' = Just(Refl, Refl)
decEq _ _ = Nothing
fields Node' (Node s t1 t2) = Just (CCons s (CCons t1 (CCons t2 CNil)))
fields InsNode' (InsNode s t1 t2) = Just (CCons s (CCons t1 (CCons t2 CNil)))
fields (String' _) _ = Just CNil
fields Empty' Empty = Just CNil
fields _ _ = Nothing
apply Node' (CCons s (CCons t1 (CCons t2 CNil))) = Node s t1 t2
apply InsNode' (CCons s (CCons t1 (CCons t2 CNil))) = InsNode s t1 t2
apply (String' s) CNil = s
apply Empty' CNil = Empty
string Node' = "Node"
string InsNode' = "InsNode"
string (String' s) = show s
string Empty' = "Empty"
instance Type TreeFamily Tree where
constructors = [ Concr Node', Concr InsNode', Concr Empty' ]
instance Type TreeFamily String where
constructors = [ Abstr String' ]
-- Input trees --
before :: Tree
before =
Node "root"
(Node "A"
(Empty)
(Empty)
)
(Empty)
after :: Tree
after =
Node "root"
(Node "A"
(Node "B" Empty Empty)
(Empty)
)
(Empty)
{-
Function for modifying the edit script
The objective is to transform edit script fragments of the form
... $ Ins Node $ ...
to
... $ Ins InsNode $ ...
-}
processEdit :: (Family f) => EditScriptL f txs tys -> EditScriptL f txs tys
processEdit End = End
processEdit (Ins c d) = Ins c (processEdit d)
processEdit (Del c d) = Del c (processEdit d)
processEdit (CpyTree d) = CpyTree (processEdit d)
processEdit (Cpy c d) = Cpy c (processEdit d)
-- Test --
-- For some reason, this signature is required for type inference to work --
runDiff :: EditScript TreeFamily Tree Tree
runDiff = diff before after
main :: IO ()
main = do
putStrLn ("before = " ++ (show before))
putStrLn ("after = " ++ (show after))
let edit = runDiff
putStrLn ("edit = " ++ (show edit))
let compressed = compress edit
putStrLn ("compressed = " ++ (show compressed))
let processed = processEdit compressed
putStrLn ("processed = " ++ (show processed))
let result = patch edit before
putStrLn ("result = " ++ (show result))
只需将 processEdit
专门化为 TreeFamily
(因为显然您要完成的工作特定于 TreeFamily
)并在 processEdit
的(第一个)参数上进行模式匹配=14=].
processEdit :: EditScriptL TreeFamily txs tys -> EditScriptL TreeFamily txs tys
processEdit End = End
processEdit (Ins Node' d) = Ins InsNode' (processEdit d)
processEdit (Ins c d) = Ins c (processEdit d)
processEdit (Del c d) = Del c (processEdit d)
processEdit (CpyTree d) = CpyTree (processEdit d)
processEdit (Cpy c d) = Cpy c (processEdit d)
不过,我不喜欢这种做法。它需要修改您的原始数据类型,并且您失去了 "original" 树和 "patched" 树之间的类型级区别。更好的解决方案是创建另一种数据类型(例如 ChangedTree
)并重新实现 patch' :: EditScriptL TreeFamily Tree Tree -> Tree -> ChangedTree
。如果您同时跟踪插入和删除,是否还需要进行 "replace" 类型的更改?
哦,runDiff
需要类型签名,否则它不知道要使用什么 Type _ Tree
实例。例如。 diff @TreeFamily before after
(TypeApplications 扩展名)会修复它。 Haskell 的类型类是开放的,所以它不会自动推断你想要 instance Type TreeFamily Tree
而不是其他 instance Type XXX Tree
,只是因为它看不到任何其他合适的 XXX
现在在范围内并不意味着它会猜测这就是您打算使用的内容。