找到最小标准偏差的最佳方法
Best way to find least standard deviation
我有一个电子表格,我在其中放置了代表一本书每个段落的经文数量的数字。
我按节数手动分配顺序段落,因此在电子表格中我会有这样的内容:
Verses Day
5 1
6 1
3 1
10 2
8 3
4 3
2 3
6 4
3 4
10 5
3 5
2 6
5 6
10 7
= 2,7080128015
通过对每天的经文总数求和 - 在本例中为 7 天 - 我得到了标准偏差并尝试减少它以更好地分配段落。
问题是:找到最小标准偏差的最佳方法是什么?
我考虑过使用蛮力生成所有可能的组合,但如果数字增加,这不是一个好主意。
编辑:标准偏差基于每天的经文总数,按顺序确定。 Day 1 共有 14 节经文,Day 2、10 依此类推。
1 14
2 10
3 14
4 9
5 13
6 7
7 10
= 2,7080128015
由于经文总数和天数是常数,所以要最小化
sum (avg verse count - verse count of day i)^2
i
avg verse count
是一个常数,简单地说就是总经文数除以天数。
这几天用动态程序就可以解决这个问题。让我们构建部分解决方案函数 f(days, paragraph)
,它为我们提供了在 days
天内分发段落 0
到 paragraph
的最小平方和。我们对这个函数的最后一个值感兴趣。
我们可以逐步构建函数。为任何 p
计算 f(1, p)
是直接的,因为我们只需要计算平均值和平方的差异。然后,对于所有其他天数,我们可以计算
f(d, p) = min f(d - 1, i) + (avg verse count - sum verse count of paragraph j)^2
i<p j:i+1..p
也就是说,我们少查了一天的解法,用前一天结束段落到p
之间的段落补上当天。当我们计算这个函数时,我们保留一个指向所选最小元素的指针(对于动态程序来说通常如此)。当我们完成整个函数的计算后,我们只需跟随指针回到起点,这将为我们提供分区。
该算法的 运行 时间为 O(d * p^2)
,其中 d
是天数,p
是段落数。
示例代码
下面是一些实现上述算法的示例 C# 代码:
struct Entry
{
public double minCost;
public int predecessor;
}
public static void Main()
{
//input data
int[] versesPerParagraph = { 5, 6, 3, 10, 8, 4, 2, 6, 3, 10, 3, 2, 5, 10 };
int days = 7;
//calculate constants
double avgVerses = (double)versesPerParagraph.Sum() / days;
//set up DP table (f(d,p))
int paragraphs = versesPerParagraph.Length;
Entry[,] dp = new Entry[days, paragraphs];
//initialize table
int verseCount = 0;
for(int p = 0; p < paragraphs; ++p)
{
verseCount += versesPerParagraph[p];
double diff = avgVerses - verseCount;
dp[0, p].minCost = diff * diff;
dp[0, p].predecessor = -1;
}
//run dynamic program
for(int d = 1; d < days; ++d)
{
for(int p = d; p < paragraphs; ++p)
{
verseCount = 0;
dp[d, p].minCost = double.MaxValue;
for(int i = p; i >= d; --i)
{
verseCount += versesPerParagraph[i];
double diff = avgVerses - verseCount;
double cost = dp[d - 1, i - 1].minCost + diff * diff;
if(cost < dp[d, p].minCost)
{
dp[d, p].minCost = cost;
dp[d, p].predecessor = i - 1;
}
}
}
}
//reconstruct the partitioning
{
int p = paragraphs - 1;
for (int d = days - 1; d >= 0; --d)
{
int predecessor = dp[d, p].predecessor;
//calculate number of verses, just to show them
verseCount = 0;
for (int i = predecessor + 1; i <= p; ++i)
verseCount += versesPerParagraph[i];
Console.WriteLine($"Day {d} ranges from paragraph {predecessor + 1} to {p} and has {verseCount} verses.");
p = predecessor;
}
}
}
输出为:
Day 6 ranges from paragraph 13 to 13 and has 10 verses.
Day 5 ranges from paragraph 10 to 12 and has 10 verses.
Day 4 ranges from paragraph 9 to 9 and has 10 verses.
Day 3 ranges from paragraph 6 to 8 and has 11 verses.
Day 2 ranges from paragraph 4 to 5 and has 12 verses.
Day 1 ranges from paragraph 2 to 3 and has 13 verses.
Day 0 ranges from paragraph 0 to 1 and has 11 verses.
此分区给出的标准差为 1.15
。
我有一个电子表格,我在其中放置了代表一本书每个段落的经文数量的数字。
我按节数手动分配顺序段落,因此在电子表格中我会有这样的内容:
Verses Day
5 1
6 1
3 1
10 2
8 3
4 3
2 3
6 4
3 4
10 5
3 5
2 6
5 6
10 7
= 2,7080128015
通过对每天的经文总数求和 - 在本例中为 7 天 - 我得到了标准偏差并尝试减少它以更好地分配段落。
问题是:找到最小标准偏差的最佳方法是什么?
我考虑过使用蛮力生成所有可能的组合,但如果数字增加,这不是一个好主意。
编辑:标准偏差基于每天的经文总数,按顺序确定。 Day 1 共有 14 节经文,Day 2、10 依此类推。
1 14
2 10
3 14
4 9
5 13
6 7
7 10
= 2,7080128015
由于经文总数和天数是常数,所以要最小化
sum (avg verse count - verse count of day i)^2
i
avg verse count
是一个常数,简单地说就是总经文数除以天数。
这几天用动态程序就可以解决这个问题。让我们构建部分解决方案函数 f(days, paragraph)
,它为我们提供了在 days
天内分发段落 0
到 paragraph
的最小平方和。我们对这个函数的最后一个值感兴趣。
我们可以逐步构建函数。为任何 p
计算 f(1, p)
是直接的,因为我们只需要计算平均值和平方的差异。然后,对于所有其他天数,我们可以计算
f(d, p) = min f(d - 1, i) + (avg verse count - sum verse count of paragraph j)^2
i<p j:i+1..p
也就是说,我们少查了一天的解法,用前一天结束段落到p
之间的段落补上当天。当我们计算这个函数时,我们保留一个指向所选最小元素的指针(对于动态程序来说通常如此)。当我们完成整个函数的计算后,我们只需跟随指针回到起点,这将为我们提供分区。
该算法的 运行 时间为 O(d * p^2)
,其中 d
是天数,p
是段落数。
示例代码
下面是一些实现上述算法的示例 C# 代码:
struct Entry
{
public double minCost;
public int predecessor;
}
public static void Main()
{
//input data
int[] versesPerParagraph = { 5, 6, 3, 10, 8, 4, 2, 6, 3, 10, 3, 2, 5, 10 };
int days = 7;
//calculate constants
double avgVerses = (double)versesPerParagraph.Sum() / days;
//set up DP table (f(d,p))
int paragraphs = versesPerParagraph.Length;
Entry[,] dp = new Entry[days, paragraphs];
//initialize table
int verseCount = 0;
for(int p = 0; p < paragraphs; ++p)
{
verseCount += versesPerParagraph[p];
double diff = avgVerses - verseCount;
dp[0, p].minCost = diff * diff;
dp[0, p].predecessor = -1;
}
//run dynamic program
for(int d = 1; d < days; ++d)
{
for(int p = d; p < paragraphs; ++p)
{
verseCount = 0;
dp[d, p].minCost = double.MaxValue;
for(int i = p; i >= d; --i)
{
verseCount += versesPerParagraph[i];
double diff = avgVerses - verseCount;
double cost = dp[d - 1, i - 1].minCost + diff * diff;
if(cost < dp[d, p].minCost)
{
dp[d, p].minCost = cost;
dp[d, p].predecessor = i - 1;
}
}
}
}
//reconstruct the partitioning
{
int p = paragraphs - 1;
for (int d = days - 1; d >= 0; --d)
{
int predecessor = dp[d, p].predecessor;
//calculate number of verses, just to show them
verseCount = 0;
for (int i = predecessor + 1; i <= p; ++i)
verseCount += versesPerParagraph[i];
Console.WriteLine($"Day {d} ranges from paragraph {predecessor + 1} to {p} and has {verseCount} verses.");
p = predecessor;
}
}
}
输出为:
Day 6 ranges from paragraph 13 to 13 and has 10 verses.
Day 5 ranges from paragraph 10 to 12 and has 10 verses.
Day 4 ranges from paragraph 9 to 9 and has 10 verses.
Day 3 ranges from paragraph 6 to 8 and has 11 verses.
Day 2 ranges from paragraph 4 to 5 and has 12 verses.
Day 1 ranges from paragraph 2 to 3 and has 13 verses.
Day 0 ranges from paragraph 0 to 1 and has 11 verses.
此分区给出的标准差为 1.15
。