Haskell: 从列表中删除重复的元组?

Haskell: Removing duplicates tuples from a list?

我正在尝试从之前到之后的状态。是否有方便的 Haskell 函数用于从列表中删除重复的元组?或者可能是更复杂的事情,例如遍历整个列表?

Before: the list of tuples, sorted by word, as in
   [(2,"a"), (1,"a"), (1,"b"), (1,"b"), (1,"c"), (2,"dd")]
After: the list of sorted tuples with exact duplicates removed, as in
   [(2,"a"), (1,"a"), (1,"b"), (1,"c"), (2,"dd")]

正在 hoogle, returns nub 函数上搜索 Eq a => [a] -> [a]

The nub function removes duplicate elements from a list. In particular, it keeps only the first occurrence of each element. (The name nub means `essence'.)

如文档中所示,更一般的情况是 nubBy

也就是说,这是一个 O(n^2) 算法,可能效率不高。如果值是 Ord type-class 的实例,则另一种方法是使用 Data.Set.fromList,如:

import qualified Data.Set as Set

nub' :: Ord a => [a] -> [a]
nub' = Set.toList . Set.fromList

虽然这将不会保持原始列表的顺序。

保持原始列表顺序的简单集合样式解决方案可以是:

import Data.Set (Set, member, insert, empty)

nub' :: Ord a => [a] -> [a]
nub' = reverse . fst . foldl loop ([], empty)
    where
    loop :: Ord a => ([a], Set a) -> a -> ([a], Set a)
    loop acc@(xs, obs) x
        | x `member` obs = acc
        | otherwise = (x:xs, x `insert` obs)

如果你想为Ord定义一个nub的版本,我推荐使用

nub' :: Ord a => [a] -> [a]
nub' xs = foldr go (`seq` []) xs empty
  where
    go x r obs
      | x `member` obs = r obs
      | otherwise = obs' `seq` x : r obs'
      where obs' = x `insert` obs

要查看它的作用,您可以去掉 foldr:

nub' :: Ord a => [a] -> [a]
nub' xs = nub'' xs empty
  where
    nub'' [] obs = obs `seq` []
    nub'' (y : ys) obs
      | y `member` obs = nub'' ys obs
      | otherwise = obs' `seq` y : nub'' ys obs'
      where obs' = y `insert` obs

关于此实现的一个关键点,与 behzad.nouri's,是它在消耗元素时懒惰地生成元素。这对于缓存利用率和垃圾收集来说通常要好得多,并且使用比反向算法更少的常数因子内存。