Haskell : GHC 会优化这个吗？

Question

GHC 能否将 id = (\(a, b) -> (a, b)).(\(a, b) -> (a, b)) 简化为 id = \(a, b) -> (a, b)？

更复杂的情况呢:

id (Just x) = Just x
id Nothing = Nothing

map f (Just x) = Just (f x)
map _ Nothing  = Nothing

GHC 会将 id . map 简化为 map 吗？

我尝试使用简单的 beta 缩减，但看起来这些项由于讨厌的模式匹配而无法缩减。

所以我很好奇 GHC 的优化技术是如何处理的。

Answer 1

你可以通过运行和-ddump-simpl来问ghc的这些问题。这将导致 ghc 转储它编译程序的 "core" 代码。核心是编译器推理 Haskell 代码的部分和将代码转换为机器码的编译器部分之间的中间语言。

当我用 -O2 -ddump-simpl 编译以下内容时，结果令我惊讶。

tupid1 :: (a, b) -> (a, b)
tupid1 = (\(a, b) -> (a, b))

tupid2 :: (a, b) -> (a, b)
tupid2 = (\(a, b) -> (a, b)) . (\(a, b) -> (a, b))

tupid1 的最终核心产生了一个新的专用身份函数。

-- RHS size: {terms: 4, types: 7, coercions: 0}
tupid1 :: forall a_aqo b_aqp. (a_aqo, b_aqp) -> (a_aqo, b_aqp)
[GblId,
 Arity=1,
 Caf=NoCafRefs,
 Str=DmdType <S,1*U(U,U)>m,
 Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
         WorkFree=True, Expandable=True,
         Guidance=ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=True)
         Tmpl= \ (@ a_ayd)
                 (@ b_aye)
                 (ds_dIl [Occ=Once] :: (a_ayd, b_aye)) ->
                 ds_dIl}]
tupid1 = \ (@ a_ayd) (@ b_aye) (ds_dIl :: (a_ayd, b_aye)) -> ds_dIl

在核心中，函数的多态类型参数表示为显式参数。 tupid1 采用其中两个类型参数，命名为 a_ayd 和 b_aye，用于其签名中的两个类型变量 a 和 b。它还需要一个术语 ds_dIl，它具有这两种类型 (ds_dIl :: (a_ayd, b_aye)) 的元组类型，并且 returns 未修改。

令人惊讶的结果是 tupid2 ...

-- RHS size: {terms: 1, types: 0, coercions: 0}
tupid2 :: forall a_aqm b_aqn. (a_aqm, b_aqn) -> (a_aqm, b_aqn)
[GblId,
 Arity=1,
 Caf=NoCafRefs,
 Str=DmdType <S,1*U(U,U)>m,
 Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
         WorkFree=True, Expandable=True,
         Guidance=ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=True)
         Tmpl= \ (@ a_axZ) (@ b_ay0) (x_aIw [Occ=Once] :: (a_axZ, b_ay0)) ->
                 x_aIw}]
tupid2 = tupid1

... ghc 简化为 tupid1！它是如何推断出来的，这超出了我的直接知识或发现能力。

Maybe

的身份示例

maybeid :: Maybe a -> Maybe a
maybeid (Just x) = Just x
maybeid Nothing = Nothing

也被简化为没有模式匹配的恒等函数

-- RHS size: {terms: 3, types: 4, coercions: 0}
maybeid :: forall a_aqn. Maybe a_aqn -> Maybe a_aqn
[GblId,
 Arity=1,
 Caf=NoCafRefs,
 Str=DmdType <S,1*U>,
 Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
         WorkFree=True, Expandable=True,
         Guidance=ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=True)
         Tmpl= \ (@ a_aqI) (ds_dIq [Occ=Once] :: Maybe a_aqI) -> ds_dIq}]
maybeid = \ (@ a_aqI) (ds_dIq :: Maybe a_aqI) -> ds_dIq

Maybe 的 map 的核心对这个问题没有兴趣

maybemap :: (a -> b) -> Maybe a -> Maybe b
maybemap f (Just x) = Just (f x)
maybemap _ Nothing = Nothing

但是如果是maybeid

maybeidmap :: (a -> b) -> Maybe a -> Maybe b
maybeidmap f = maybeid . maybemap f

ghc 将其简化为 maybemap

-- RHS size: {terms: 1, types: 0, coercions: 0}
maybeidmap
  :: forall a_aqp b_aqq.
     (a_aqp -> b_aqq) -> Maybe a_aqp -> Maybe b_aqq
[GblId,
 Arity=2,
 Caf=NoCafRefs,
 Str=DmdType <L,1*C1(U)><S,1*U>,
 Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
         WorkFree=True, Expandable=True,
         Guidance=ALWAYS_IF(arity=0,unsat_ok=True,boring_ok=True)
         Tmpl= maybemap}]
maybeidmap = maybemap

如果 id 与 f 组成，它会做同样的事情。

maybemapid :: (a -> b) -> Maybe a -> Maybe b
maybemapid f = maybemap (id . f)

去除恒等函数的组合，整个函数简化为maybemap

-- RHS size: {terms: 1, types: 0, coercions: 0}
maybemapid
  :: forall a_aqq b_aqr.
     (a_aqq -> b_aqr) -> Maybe a_aqq -> Maybe b_aqr
[GblId,
 Arity=2,
 Caf=NoCafRefs,
 Str=DmdType <L,1*C1(U)><S,1*U>,
 Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
         WorkFree=True, Expandable=True,
         Guidance=ALWAYS_IF(arity=2,unsat_ok=True,boring_ok=False)
         Tmpl= \ (@ a_ar2)
                 (@ b_ar3)
                 (f_aqL [Occ=Once!] :: a_ar2 -> b_ar3)
                 (eta_B1 [Occ=Once!] :: Maybe a_ar2) ->
                 case eta_B1 of _ [Occ=Dead] {
                   Nothing -> GHC.Base.Nothing @ b_ar3;
                   Just x_aqJ [Occ=Once] -> GHC.Base.Just @ b_ar3 (f_aqL x_aqJ)
                 }}]
maybemapid = maybemap

Haskell : GHC 会优化这个吗？

Haskell : Will GHC optimize this?

haskell

ghc