将集合拆分为 n 个部分未提供所需的结果序列
Split collection into n of parts is not giving the desired resulting secuences
我正在尝试将集合拆分为特定数量的部分,我在 Whosebug 上寻求了一些解决方案的帮助:Split a collection into `n` parts with LINQ?
这是我的VB.Net翻译自@Hasan Khan 解决方案:
''' <summary>
''' Splits an <see cref="IEnumerable(Of T)"/> into the specified amount of secuences.
''' </summary>
Public Shared Function SplitIntoParts(Of T)(ByVal col As IEnumerable(Of T),
ByVal amount As Integer) As IEnumerable(Of IEnumerable(Of T))
Dim i As Integer = 0
Dim splits As IEnumerable(Of IEnumerable(Of T)) =
From item As T In col
Group item By item = Threading.Interlocked.Increment(i) Mod amount
Into Group
Select Group.AsEnumerable()
Return splits
End Function
这是我对@manu08 解决方案的 VB.Net 翻译:
''' <summary>
''' Splits an <see cref="IEnumerable(Of T)"/> into the specified amount of secuences.
''' </summary>
Public Shared Function SplitIntoParts(Of T)(ByVal col As IEnumerable(Of T),
ByVal amount As Integer) As IEnumerable(Of IEnumerable(Of T))
Return col.Select(Function(item, index) New With {index, item}).
GroupBy(Function(x) x.index Mod amount).
Select(Function(x) x.Select(Function(y) y.item))
End Function
问题是这两个函数 returns 一个错误的结果。
因为如果我像这样拆分一个集合:
Dim mainCol As IEnumerable(Of Integer) = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
Dim splittedCols As IEnumerable(Of IEnumerable(Of Integer)) =
SplitIntoParts(col:=mainCol, amount:=2)
两个函数都给出了这个结果:
1: { 1, 3, 5, 7, 9 }
2: { 2, 4, 6, 8, 10 }
而不是这些序列:
1: { 1, 2, 3, 4, 5 }
2: { 6, 7, 8, 9, 10 }
我做错了什么?
你没有做错什么;只是您使用的方法没有按照您想要的方式保持排序。想想 mod
和 GroupBy
是如何工作的,你就会明白为什么。
我建议你使用 Jon Skeet's answer,因为它保留了你的 collection 的顺序(我冒昧地为你翻译成 VB.Net)。
你只需要事先计算每个分区的大小,因为它不会将 collection 分成 n
块,而是分成长度为 n
的块:
<Extension> _
Public Shared Iterator Function Partition(Of T)(source As IEnumerable(Of T), size As Integer) As IEnumerable(Of IEnumerable(Of T))
Dim array__1 As T() = Nothing
Dim count As Integer = 0
For Each item As T In source
If array__1 Is Nothing Then
array__1 = New T(size - 1) {}
End If
array__1(count) = item
count += 1
If count = size Then
yield New ReadOnlyCollection(Of T)(array__1)
array__1 = Nothing
count = 0
End If
Next
If array__1 IsNot Nothing Then
Array.Resize(array__1, count)
yield New ReadOnlyCollection(Of T)(array__1)
End If
End Function
并使用它:
mainCol.Partition(CInt(Math.Ceiling(mainCol.Count() / 2)))
随意隐藏新方法中的 Partition(CInt(Math.Ceiling(...))
部分。
低效的解决方案(对数据的迭代太多):
class Program
{
static void Main(string[] args)
{
var data = Enumerable.Range(1, 10);
var result = data.Split(2);
}
}
static class Extensions
{
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> col, int amount)
{
var chunkSize = (int)Math.Ceiling((double)col.Count() / (double)amount);
for (var i = 0; i < amount; ++i)
yield return col.Skip(chunkSize * i).Take(chunkSize);
}
}
编辑:
在VB.Net
Public Shared Iterator Function SplitIntoParts(Of T)(ByVal col As IEnumerable(Of T),
ByVal amount As Integer) As IEnumerable(Of IEnumerable(Of T))
Dim chunkSize As Integer = CInt(Math.Ceiling(CDbl(col.Count()) / CDbl(amount)))
For i As Integer = 0 To amount - 1
Yield col.Skip(chunkSize * i).Take(chunkSize)
Next
End Function
MyExtensions class 有两个 public Split 方法:
- 对于 ICollection - 遍历集合 仅一次 - 用于拆分。
- For IEnumerable - 遍历可枚举 两次:用于计算项目并拆分它们。尽可能不要使用它(第一个是安全的,而且速度快两倍)。
更多:此算法试图恢复确切指定数量的集合。
public static class MyExtensions
{
// Works with ICollection - iterates through collection only once.
public static IEnumerable<IEnumerable<T>> Split<T>(this ICollection<T> items, int count)
{
return Split(items, items.Count, count);
}
// Works with IEnumerable and iterates items TWICE: first for count items, second to split them.
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> items, int count)
{
// ReSharper disable PossibleMultipleEnumeration
var itemsCount = items.Count();
return Split(items, itemsCount, count);
// ReSharper restore PossibleMultipleEnumeration
}
private static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> items, int itemsCount, int partsCount)
{
if (items == null)
throw new ArgumentNullException("items");
if (partsCount <= 0)
throw new ArgumentOutOfRangeException("partsCount");
var rem = itemsCount % partsCount;
var min = itemsCount / partsCount;
var max = rem != 0 ? min + 1 : min;
var index = 0;
var enumerator = items.GetEnumerator();
while (index < itemsCount)
{
var size = 0 < rem-- ? max : min;
yield return SplitPart(enumerator, size);
index += size;
}
}
private static IEnumerable<T> SplitPart<T>(IEnumerator<T> enumerator, int count)
{
for (var i = 0; i < count; i++)
{
if (!enumerator.MoveNext())
break;
yield return enumerator.Current;
}
}
}
示例程序:
var items = new [] {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'};
for(var i = 1; i <= items.Length + 3; i++)
{
Console.WriteLine("{0} part(s)", i);
foreach (var part in items.Split(i))
Console.WriteLine(string.Join(", ", part));
Console.WriteLine();
}
...此程序的输出:
1 part(s)
a, b, c, d, e, f, g, h, i, j
2 part(s)
a, b, c, d, e
f, g, h, i, j
3 part(s)
a, b, c, d
e, f, g
h, i, j
4 part(s)
a, b, c
d, e, f
g, h
i, j
5 part(s)
a, b
c, d
e, f
g, h
i, j
6 part(s)
a, b
c, d
e, f
g, h
i
j
7 part(s)
a, b
c, d
e, f
g
h
i
j
8 part(s)
a, b
c, d
e
f
g
h
i
j
9 part(s)
a, b
c
d
e
f
g
h
i
j
10 part(s)
a
b
c
d
e
f
g
h
i
j
11 part(s) // Only 10 items in collection.
a
b
c
d
e
f
g
h
i
j
12 part(s) // Only 10 items in collection.
a
b
c
d
e
f
g
h
i
j
13 part(s) // Only 10 items in collection.
a
b
c
d
e
f
g
h
i
j
我正在尝试将集合拆分为特定数量的部分,我在 Whosebug 上寻求了一些解决方案的帮助:Split a collection into `n` parts with LINQ?
这是我的VB.Net翻译自@Hasan Khan 解决方案:
''' <summary>
''' Splits an <see cref="IEnumerable(Of T)"/> into the specified amount of secuences.
''' </summary>
Public Shared Function SplitIntoParts(Of T)(ByVal col As IEnumerable(Of T),
ByVal amount As Integer) As IEnumerable(Of IEnumerable(Of T))
Dim i As Integer = 0
Dim splits As IEnumerable(Of IEnumerable(Of T)) =
From item As T In col
Group item By item = Threading.Interlocked.Increment(i) Mod amount
Into Group
Select Group.AsEnumerable()
Return splits
End Function
这是我对@manu08 解决方案的 VB.Net 翻译:
''' <summary>
''' Splits an <see cref="IEnumerable(Of T)"/> into the specified amount of secuences.
''' </summary>
Public Shared Function SplitIntoParts(Of T)(ByVal col As IEnumerable(Of T),
ByVal amount As Integer) As IEnumerable(Of IEnumerable(Of T))
Return col.Select(Function(item, index) New With {index, item}).
GroupBy(Function(x) x.index Mod amount).
Select(Function(x) x.Select(Function(y) y.item))
End Function
问题是这两个函数 returns 一个错误的结果。
因为如果我像这样拆分一个集合:
Dim mainCol As IEnumerable(Of Integer) = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
Dim splittedCols As IEnumerable(Of IEnumerable(Of Integer)) =
SplitIntoParts(col:=mainCol, amount:=2)
两个函数都给出了这个结果:
1: { 1, 3, 5, 7, 9 }
2: { 2, 4, 6, 8, 10 }
而不是这些序列:
1: { 1, 2, 3, 4, 5 }
2: { 6, 7, 8, 9, 10 }
我做错了什么?
你没有做错什么;只是您使用的方法没有按照您想要的方式保持排序。想想 mod
和 GroupBy
是如何工作的,你就会明白为什么。
我建议你使用 Jon Skeet's answer,因为它保留了你的 collection 的顺序(我冒昧地为你翻译成 VB.Net)。
你只需要事先计算每个分区的大小,因为它不会将 collection 分成 n
块,而是分成长度为 n
的块:
<Extension> _
Public Shared Iterator Function Partition(Of T)(source As IEnumerable(Of T), size As Integer) As IEnumerable(Of IEnumerable(Of T))
Dim array__1 As T() = Nothing
Dim count As Integer = 0
For Each item As T In source
If array__1 Is Nothing Then
array__1 = New T(size - 1) {}
End If
array__1(count) = item
count += 1
If count = size Then
yield New ReadOnlyCollection(Of T)(array__1)
array__1 = Nothing
count = 0
End If
Next
If array__1 IsNot Nothing Then
Array.Resize(array__1, count)
yield New ReadOnlyCollection(Of T)(array__1)
End If
End Function
并使用它:
mainCol.Partition(CInt(Math.Ceiling(mainCol.Count() / 2)))
随意隐藏新方法中的 Partition(CInt(Math.Ceiling(...))
部分。
低效的解决方案(对数据的迭代太多):
class Program
{
static void Main(string[] args)
{
var data = Enumerable.Range(1, 10);
var result = data.Split(2);
}
}
static class Extensions
{
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> col, int amount)
{
var chunkSize = (int)Math.Ceiling((double)col.Count() / (double)amount);
for (var i = 0; i < amount; ++i)
yield return col.Skip(chunkSize * i).Take(chunkSize);
}
}
编辑:
在VB.Net
Public Shared Iterator Function SplitIntoParts(Of T)(ByVal col As IEnumerable(Of T),
ByVal amount As Integer) As IEnumerable(Of IEnumerable(Of T))
Dim chunkSize As Integer = CInt(Math.Ceiling(CDbl(col.Count()) / CDbl(amount)))
For i As Integer = 0 To amount - 1
Yield col.Skip(chunkSize * i).Take(chunkSize)
Next
End Function
MyExtensions class 有两个 public Split 方法:
- 对于 ICollection - 遍历集合 仅一次 - 用于拆分。
- For IEnumerable - 遍历可枚举 两次:用于计算项目并拆分它们。尽可能不要使用它(第一个是安全的,而且速度快两倍)。
更多:此算法试图恢复确切指定数量的集合。
public static class MyExtensions
{
// Works with ICollection - iterates through collection only once.
public static IEnumerable<IEnumerable<T>> Split<T>(this ICollection<T> items, int count)
{
return Split(items, items.Count, count);
}
// Works with IEnumerable and iterates items TWICE: first for count items, second to split them.
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> items, int count)
{
// ReSharper disable PossibleMultipleEnumeration
var itemsCount = items.Count();
return Split(items, itemsCount, count);
// ReSharper restore PossibleMultipleEnumeration
}
private static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> items, int itemsCount, int partsCount)
{
if (items == null)
throw new ArgumentNullException("items");
if (partsCount <= 0)
throw new ArgumentOutOfRangeException("partsCount");
var rem = itemsCount % partsCount;
var min = itemsCount / partsCount;
var max = rem != 0 ? min + 1 : min;
var index = 0;
var enumerator = items.GetEnumerator();
while (index < itemsCount)
{
var size = 0 < rem-- ? max : min;
yield return SplitPart(enumerator, size);
index += size;
}
}
private static IEnumerable<T> SplitPart<T>(IEnumerator<T> enumerator, int count)
{
for (var i = 0; i < count; i++)
{
if (!enumerator.MoveNext())
break;
yield return enumerator.Current;
}
}
}
示例程序:
var items = new [] {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'};
for(var i = 1; i <= items.Length + 3; i++)
{
Console.WriteLine("{0} part(s)", i);
foreach (var part in items.Split(i))
Console.WriteLine(string.Join(", ", part));
Console.WriteLine();
}
...此程序的输出:
1 part(s)
a, b, c, d, e, f, g, h, i, j
2 part(s)
a, b, c, d, e
f, g, h, i, j
3 part(s)
a, b, c, d
e, f, g
h, i, j
4 part(s)
a, b, c
d, e, f
g, h
i, j
5 part(s)
a, b
c, d
e, f
g, h
i, j
6 part(s)
a, b
c, d
e, f
g, h
i
j
7 part(s)
a, b
c, d
e, f
g
h
i
j
8 part(s)
a, b
c, d
e
f
g
h
i
j
9 part(s)
a, b
c
d
e
f
g
h
i
j
10 part(s)
a
b
c
d
e
f
g
h
i
j
11 part(s) // Only 10 items in collection.
a
b
c
d
e
f
g
h
i
j
12 part(s) // Only 10 items in collection.
a
b
c
d
e
f
g
h
i
j
13 part(s) // Only 10 items in collection.
a
b
c
d
e
f
g
h
i
j