Haskell: 从列表中删除重复的元组?
Haskell: Removing duplicates tuples from a list?
我正在尝试从之前到之后的状态。是否有方便的 Haskell 函数用于从列表中删除重复的元组?或者可能是更复杂的事情,例如遍历整个列表?
Before: the list of tuples, sorted by word, as in
[(2,"a"), (1,"a"), (1,"b"), (1,"b"), (1,"c"), (2,"dd")]
After: the list of sorted tuples with exact duplicates removed, as in
[(2,"a"), (1,"a"), (1,"b"), (1,"c"), (2,"dd")]
正在 hoogle, returns nub
函数上搜索 Eq a => [a] -> [a]
:
The nub function removes duplicate elements from a list. In particular, it keeps only the first occurrence of each element. (The name nub means `essence'.)
如文档中所示,更一般的情况是 nubBy
。
也就是说,这是一个 O(n^2)
算法,可能效率不高。如果值是 Ord
type-class 的实例,则另一种方法是使用 Data.Set.fromList
,如:
import qualified Data.Set as Set
nub' :: Ord a => [a] -> [a]
nub' = Set.toList . Set.fromList
虽然这将不会保持原始列表的顺序。
保持原始列表顺序的简单集合样式解决方案可以是:
import Data.Set (Set, member, insert, empty)
nub' :: Ord a => [a] -> [a]
nub' = reverse . fst . foldl loop ([], empty)
where
loop :: Ord a => ([a], Set a) -> a -> ([a], Set a)
loop acc@(xs, obs) x
| x `member` obs = acc
| otherwise = (x:xs, x `insert` obs)
如果你想为Ord
定义一个nub
的版本,我推荐使用
nub' :: Ord a => [a] -> [a]
nub' xs = foldr go (`seq` []) xs empty
where
go x r obs
| x `member` obs = r obs
| otherwise = obs' `seq` x : r obs'
where obs' = x `insert` obs
要查看它的作用,您可以去掉 foldr
:
nub' :: Ord a => [a] -> [a]
nub' xs = nub'' xs empty
where
nub'' [] obs = obs `seq` []
nub'' (y : ys) obs
| y `member` obs = nub'' ys obs
| otherwise = obs' `seq` y : nub'' ys obs'
where obs' = y `insert` obs
关于此实现的一个关键点,与
behzad.nouri's,是它在消耗元素时懒惰地生成元素。这对于缓存利用率和垃圾收集来说通常要好得多,并且使用比反向算法更少的常数因子内存。
我正在尝试从之前到之后的状态。是否有方便的 Haskell 函数用于从列表中删除重复的元组?或者可能是更复杂的事情,例如遍历整个列表?
Before: the list of tuples, sorted by word, as in
[(2,"a"), (1,"a"), (1,"b"), (1,"b"), (1,"c"), (2,"dd")]
After: the list of sorted tuples with exact duplicates removed, as in
[(2,"a"), (1,"a"), (1,"b"), (1,"c"), (2,"dd")]
正在 hoogle, returns nub
函数上搜索 Eq a => [a] -> [a]
:
The nub function removes duplicate elements from a list. In particular, it keeps only the first occurrence of each element. (The name nub means `essence'.)
如文档中所示,更一般的情况是 nubBy
。
也就是说,这是一个 O(n^2)
算法,可能效率不高。如果值是 Ord
type-class 的实例,则另一种方法是使用 Data.Set.fromList
,如:
import qualified Data.Set as Set
nub' :: Ord a => [a] -> [a]
nub' = Set.toList . Set.fromList
虽然这将不会保持原始列表的顺序。
保持原始列表顺序的简单集合样式解决方案可以是:
import Data.Set (Set, member, insert, empty)
nub' :: Ord a => [a] -> [a]
nub' = reverse . fst . foldl loop ([], empty)
where
loop :: Ord a => ([a], Set a) -> a -> ([a], Set a)
loop acc@(xs, obs) x
| x `member` obs = acc
| otherwise = (x:xs, x `insert` obs)
如果你想为Ord
定义一个nub
的版本,我推荐使用
nub' :: Ord a => [a] -> [a]
nub' xs = foldr go (`seq` []) xs empty
where
go x r obs
| x `member` obs = r obs
| otherwise = obs' `seq` x : r obs'
where obs' = x `insert` obs
要查看它的作用,您可以去掉 foldr
:
nub' :: Ord a => [a] -> [a]
nub' xs = nub'' xs empty
where
nub'' [] obs = obs `seq` []
nub'' (y : ys) obs
| y `member` obs = nub'' ys obs
| otherwise = obs' `seq` y : nub'' ys obs'
where obs' = y `insert` obs
关于此实现的一个关键点,与 behzad.nouri's,是它在消耗元素时懒惰地生成元素。这对于缓存利用率和垃圾收集来说通常要好得多,并且使用比反向算法更少的常数因子内存。