将间隔集拆分为最小的不相交间隔集
Split set of intervals into minimal set of disjoint intervals
如何有效地将一组区间(input 集合)拆分为最小的不相交区间集合(output 集合), 这样输入集中的所有区间都可以表示为输出集中区间的并集 ?
示例:
Input: [0,9] [2,12]
Output: [0,1] [2,9] [10,12]
Test :
[0,9] = [0,1] ∪ [2,9]
[2,12] = [2,9] ∪ [10,12]
Input: [0,Infinity] [1,5] [4,6]
Output: [0,0] [1,3] [4,5] [6,6] [7,Infinity]
Test :
[0,Infinity] = [0,0] ∪ [1,3] ∪ [4,5] ∪ [6,6] ∪ [7,Infinity]
[1,5] = [1,3] ∪ [4,5]
[4,6] = [4,5] ∪ [6,6]
我需要在 Javascript 中执行此操作。这是我尝试过的想法:
// The input is an array of intervals, like [[0,9], [2,12]], same for the output
// This function converts a set of overlapping
// intervals into a set of disjoint intervals...
const disjoin = intervals => {
if(intervals.length < 2)
return intervals
const [first, ...rest] = intervals
// ...by recursively injecting each interval into
// an ordered set of disjoint intervals
return insert(first, disjoin(rest))
}
// This function inserts an interval [a,b] into
// an ordered set of disjoint intervals
const insert = ([a, b], intervals) => {
// First we "locate" a and b relative to the interval
// set (before, after, or index of the interval within the set
const pa = pos(a, intervals)
const pb = pos(b, intervals)
// Then we bruteforce all possibilities
if(pa === 'before' && pb === 'before')
return [[a, b], ...intervals]
if(pa === 'before' && pb === 'after')
// ...
if(pa === 'before' && typeof pb === 'number')
// ...
// ... all 6 possibilities
}
const first = intervals => intervals[0][0]
const last = intervals => intervals[intervals.length-1][1]
const pos = (n, intervals) => {
if(n < first(intervals))
return 'before'
if(n > last(intervals))
return 'after'
return intervals.findIndex(([a, b]) => a <= n && n <= b)
}
但是效率很低。在 pos
函数中,我可以进行二进制搜索来加快速度,但我主要想知道是否 :
- 这是一个已知问题,在算法界享有盛誉
- 有一个最优解,与我试过的无关
输入集中的每个边界点也需要在输出集中。每对相邻边界点之间的间隔如果在至少一个输入内,则在输出中。
splitIntervals = (input) => {
const starts = input.map(x => [x[0],1]);
const ends = input.map(x => [x[1]+1, -1]);
let count=0;
let prev=null;
return [...starts, ...ends]
.sort((a,b) => (a[0] - b[0])) //sort boundary points
.map(x => {
//make an interval for every section that is inside any input interval
const ret= (x[0] > prev && count !== 0 ? [prev, x[0]-1] : null);
prev=x[0];
count+=x[1];
return ret;
})
.filter(x => !!x);
}
测试:
> splitIntervals([ [0,9], [2,12] ])
[ [ 0, 1 ], [ 2, 9 ], [ 10, 12 ] ]
> splitIntervals([[0,9], [3,9], [4,13]])
[ [ 0, 2 ], [ 3, 3 ], [ 4, 9 ], [ 10, 13 ] ]
与 Matt 的回答类似,这会收集所有点,并从中构建输出,保持计数以检测间隔之间的差距。
不同的是,这里第一阶段排除重复项(基于Map
)而不是稍后过滤掉它们,最后阶段是用函数式编程风格:
const disjoint = intervals =>
[...intervals.reduce((acc, [first, last]) =>
acc.set(first, (acc.get(first )||0) + 1).set(last+1,(acc.get(last+1)||0) - 1)
, new Map)]
.sort((a,b) => a[0]-b[0])
.reduce(([prev, inside, res], [curr, change]) =>
[curr, inside+change, (!inside || res.push([prev, curr-1]), res)]
, [0, 0, []] )
.pop();
console.log(disjoint([[0,9], [2,12]]));
console.log(disjoint([[0,Infinity],[1,5],[4,6]]));
console.log(disjoint([[0,6],[1,2],[3,6],[6,6],[7,9],[7,8]]));
这个问题可以映射为图着色问题,其中每个区间是一个顶点,边连接有重叠的顶点(区间),颜色代表区间所属的集合。根据图着色问题的定义,两个相连的顶点(两个重叠区间)不应该有相同的颜色(应该属于不同的集合)
见https://en.wikipedia.org/wiki/Graph_coloring
Welsh Powell 算法可用于获得好的解决方案(不保证是最优的)。
如何有效地将一组区间(input 集合)拆分为最小的不相交区间集合(output 集合), 这样输入集中的所有区间都可以表示为输出集中区间的并集 ?
示例:
Input: [0,9] [2,12]
Output: [0,1] [2,9] [10,12]
Test :
[0,9] = [0,1] ∪ [2,9]
[2,12] = [2,9] ∪ [10,12]
Input: [0,Infinity] [1,5] [4,6]
Output: [0,0] [1,3] [4,5] [6,6] [7,Infinity]
Test :
[0,Infinity] = [0,0] ∪ [1,3] ∪ [4,5] ∪ [6,6] ∪ [7,Infinity]
[1,5] = [1,3] ∪ [4,5]
[4,6] = [4,5] ∪ [6,6]
我需要在 Javascript 中执行此操作。这是我尝试过的想法:
// The input is an array of intervals, like [[0,9], [2,12]], same for the output
// This function converts a set of overlapping
// intervals into a set of disjoint intervals...
const disjoin = intervals => {
if(intervals.length < 2)
return intervals
const [first, ...rest] = intervals
// ...by recursively injecting each interval into
// an ordered set of disjoint intervals
return insert(first, disjoin(rest))
}
// This function inserts an interval [a,b] into
// an ordered set of disjoint intervals
const insert = ([a, b], intervals) => {
// First we "locate" a and b relative to the interval
// set (before, after, or index of the interval within the set
const pa = pos(a, intervals)
const pb = pos(b, intervals)
// Then we bruteforce all possibilities
if(pa === 'before' && pb === 'before')
return [[a, b], ...intervals]
if(pa === 'before' && pb === 'after')
// ...
if(pa === 'before' && typeof pb === 'number')
// ...
// ... all 6 possibilities
}
const first = intervals => intervals[0][0]
const last = intervals => intervals[intervals.length-1][1]
const pos = (n, intervals) => {
if(n < first(intervals))
return 'before'
if(n > last(intervals))
return 'after'
return intervals.findIndex(([a, b]) => a <= n && n <= b)
}
但是效率很低。在 pos
函数中,我可以进行二进制搜索来加快速度,但我主要想知道是否 :
- 这是一个已知问题,在算法界享有盛誉
- 有一个最优解,与我试过的无关
输入集中的每个边界点也需要在输出集中。每对相邻边界点之间的间隔如果在至少一个输入内,则在输出中。
splitIntervals = (input) => {
const starts = input.map(x => [x[0],1]);
const ends = input.map(x => [x[1]+1, -1]);
let count=0;
let prev=null;
return [...starts, ...ends]
.sort((a,b) => (a[0] - b[0])) //sort boundary points
.map(x => {
//make an interval for every section that is inside any input interval
const ret= (x[0] > prev && count !== 0 ? [prev, x[0]-1] : null);
prev=x[0];
count+=x[1];
return ret;
})
.filter(x => !!x);
}
测试:
> splitIntervals([ [0,9], [2,12] ])
[ [ 0, 1 ], [ 2, 9 ], [ 10, 12 ] ]
> splitIntervals([[0,9], [3,9], [4,13]])
[ [ 0, 2 ], [ 3, 3 ], [ 4, 9 ], [ 10, 13 ] ]
与 Matt 的回答类似,这会收集所有点,并从中构建输出,保持计数以检测间隔之间的差距。
不同的是,这里第一阶段排除重复项(基于Map
)而不是稍后过滤掉它们,最后阶段是用函数式编程风格:
const disjoint = intervals =>
[...intervals.reduce((acc, [first, last]) =>
acc.set(first, (acc.get(first )||0) + 1).set(last+1,(acc.get(last+1)||0) - 1)
, new Map)]
.sort((a,b) => a[0]-b[0])
.reduce(([prev, inside, res], [curr, change]) =>
[curr, inside+change, (!inside || res.push([prev, curr-1]), res)]
, [0, 0, []] )
.pop();
console.log(disjoint([[0,9], [2,12]]));
console.log(disjoint([[0,Infinity],[1,5],[4,6]]));
console.log(disjoint([[0,6],[1,2],[3,6],[6,6],[7,9],[7,8]]));
这个问题可以映射为图着色问题,其中每个区间是一个顶点,边连接有重叠的顶点(区间),颜色代表区间所属的集合。根据图着色问题的定义,两个相连的顶点(两个重叠区间)不应该有相同的颜色(应该属于不同的集合)
见https://en.wikipedia.org/wiki/Graph_coloring
Welsh Powell 算法可用于获得好的解决方案(不保证是最优的)。