Groovy - 从复杂的嵌套地图中聚合和建模数据
Groovy - Aggregating and Modelling Data from Complex Nested Maps
我在 groovy 中的数据显示在下面的代码片段中:
def productAvailability = [
[id: 1, startDate: "2014-12-22", endDate: "2015-01-02", storeId: 1, productId: 1, categoryId: 1],
[id: 4, startDate: "2014-12-24", endDate: "2015-01-08", storeId: 2, productId: 1, categoryId: 1],
[id: 8, startDate: "2014-12-25", endDate: "2015-01-01", storeId: 2, productId: 3, categoryId: 1],
[id: 9, startDate: "2014-12-22", endDate: "2015-01-02", storeId: 1, productId: 3, categoryId: 1],
[id: 10, startDate: "2015-01-10", endDate: "2015-01-21", storeId: 1, productId: 1, categoryId: 1]
];
objective就是得到这样的结果:
产品统计
Product Id: 1 | Availability Index: 15 + 11 + 11 = 37.
Longest Available Products (Sort By Past Start Date *first* then, Store Id):
1. "2014-12-24" to "2015-01-08" in store id 2. (15 days)
2. "2014-12-22" to "2015-01-02" in store id 1. (11 days)
3. "2015-01-10" to "2015-01-21" in store id 1. (11 days)
Product Id: 3 | Availability Index: 7 + 11 = 18.
Longest Available Products (Sort By Past Start Date *first* then, Store Id):
1. "2014-12-22" to "2015-01-02" in store id 1. (11 days)
2. "2014-12-25" to "2015-01-01" in store id 2. (7 days)
店铺统计
Store Id: 1 | Availability Index: 11 + 11 + 11 = 33.
Most Available Product (sort by most available product, then sort by product id):
1. Product Id: 3 on ["2014-12-22" to "2015-01-02"] (11 days)
2. Product Id: 1 on ["2014-12-22" to "2015-01-02", "2015-01-10" to "2015-01-21"] (11 days)
Store Id: 2 | Availability Index: 15 + 7 = 22.
Most Available Product (sort by most available product, then sort by product id):
1. Product Id: 1 on ["2014-12-24" to "2015-01-08"] (15 days)
2. Product Id: 3 on ["2014-12-25" to "2015-01-01"] (7 days)
Total Availability Index: 37 + 18 or 33 + 22 = 55.
这里和上面的打印结果是产品统计和店铺统计。
我会寻求优化、高效且易于理解的解决方案来打印上面的结果。
我尝试从上面的数据中得到的结果:
// productAvailability => see the declaration variable above in the beginning of question!
List aggregateDates = productAvailability.collect({[
storeId: it.storeId,
productId: it.productId,
availabilityIndex: Date.parse("YYYY-MM-dd", it.endDate) - Date.parse("YYYY-MM-dd", it.startDate)
]});
println "Total Availability Index: " + aggregateDates.clone().sum({ it.availabilityIndex });
println "Total Products: " + aggregateDates.clone().unique({ it.productId }).count({ it.productId });
println "Total Stores: " + aggregateDates.clone().unique({ it.storeId }).count({ it.storeId });
println "Average Availability Index: " + aggregateDates.clone().sum({ it.availabilityIndex }) / aggregateDates.size();
正如您在上面的代码片段中看到的,我可以非常轻松地获得 productAvailability[=41= 中有多少 PRODUCT 和 STORE 的汇总 SUM、AVG 和 COUNT ]数据。但是,这对我来说很难获得基于 PRODUCT 和 STORE 的可用性,使用日期范围来实现上述 objective。
请参阅下面使用日期范围的代码。
def dailyDatesAvailability = [:] as Map<Date, Integer>;
def dailyStoresAvailability = [:].withDefault {0} as Map<Integer, Integer>;
def dailyProductsAvailability = [:].withDefault {0} as Map<Integer, Integer>;
(Date.parse("YYYY-MM-dd", "2014-12-01")).upto((Date.parse("YYYY-MM-dd", "2015-01-30"))) { Date runningDate ->
dailyDatesAvailability[runningDate] = 0;
productAvailability.each({ _availability ->
def _startDate = Date.parse("YYYY-MM-dd", _availability.startDate);
def _endDate = Date.parse("YYYY-MM-dd", _availability.endDate);
if (_startDate <= runningDate && _endDate >= runningDate) {
dailyDatesAvailability[runningDate]++;
dailyProductsAvailability[_availability.productId]++;
dailyStoresAvailability[_availability.storeId]++;
}
// Do something here to get the MOST available PRODUCT in a STORE with date ranges
});
/// or do something here....?
}
使用 Groovy 打印上面 的 objective 的最佳方法是什么?请分享代码片段以便进行测试。
对此很感兴趣,并想出了:
List<Range> simplify( List<Range> ranges ) {
ranges.drop( 1 ).inject( ranges.take( 1 ) ) { r, curr ->
// Find an overlapping range
def ov = r.find { curr.from <= it.to && curr.to >= it.from }
if( ov ) {
ov.from = [ curr.from, ov.from ].min()
ov.to = [ curr.to, ov.to ].max()
simplify( r )
}
else {
r << curr
}
}
}
def manipulate(data, primary, secondary) {
data.groupBy { it."$primary" }
.collect { id, vals ->
def joined = vals.collect { it ->
[ id: it.id,
range: Date.parse('yyyy-MM-dd', it.startDate)..Date.parse('yyyy-MM-dd', it.endDate),
key: secondary,
value: it."$secondary" ]
}.groupBy { it.value }
.collectMany { sid, ran -> simplify(ran.range).collect { [key: secondary, value: sid, range:it, days:(it.to - it.from)] } }
.sort { a, b -> b.days <=> a.days ?: a.value - b.value }
[name:primary, id:id, data:joined]
}
}
def dump(data) {
data.collect { a ->
def sum = a.data.days.sum()
println "$a.name: $a.id | availability index ${a.data.days.join(' + ')} = ${sum}"
a.data.eachWithIndex { row, idx ->
println " ${idx+1}. ${row.range.from.format('yyyy-MM-dd')} to ${row.range.to.format('yyyy-MM-dd')} in $row.key $row.value ($row.days days)"
}
sum
}
}
def productAvailability = [
[id: 1, startDate: "2014-12-22", endDate: "2015-01-02", storeId: 1, productId: 1, categoryId: 1],
[id: 4, startDate: "2014-12-24", endDate: "2015-01-08", storeId: 2, productId: 1, categoryId: 1],
[id: 8, startDate: "2014-12-25", endDate: "2015-01-01", storeId: 2, productId: 3, categoryId: 1],
[id: 9, startDate: "2014-12-22", endDate: "2015-01-02", storeId: 1, productId: 3, categoryId: 1],
[id: 10, startDate: "2015-01-10", endDate: "2015-01-21", storeId: 1, productId: 1, categoryId: 1]
];
def p = dump(manipulate(productAvailability, 'productId', 'storeId'))
println ''
def s = dump(manipulate(productAvailability, 'storeId', 'productId'))
println ''
println "Total Availability Index: ${p.join(' + ')} or ${s.join(' + ')} = ${[p.sum(), s.sum()].max()}"
打印出:
productId: 1 | availability index 15 + 11 + 11 = 37
1. 2014-12-24 to 2015-01-08 in storeId 2 (15 days)
2. 2014-12-22 to 2015-01-02 in storeId 1 (11 days)
3. 2015-01-10 to 2015-01-21 in storeId 1 (11 days)
productId: 3 | availability index 11 + 7 = 18
1. 2014-12-22 to 2015-01-02 in storeId 1 (11 days)
2. 2014-12-25 to 2015-01-01 in storeId 2 (7 days)
storeId: 1 | availability index 11 + 11 + 11 = 33
1. 2014-12-22 to 2015-01-02 in productId 1 (11 days)
2. 2015-01-10 to 2015-01-21 in productId 1 (11 days)
3. 2014-12-22 to 2015-01-02 in productId 3 (11 days)
storeId: 2 | availability index 15 + 7 = 22
1. 2014-12-24 to 2015-01-08 in productId 1 (15 days)
2. 2014-12-25 to 2015-01-01 in productId 3 (7 days)
Total Availability Index: 37 + 18 or 33 + 22 = 55
我在 groovy 中的数据显示在下面的代码片段中:
def productAvailability = [
[id: 1, startDate: "2014-12-22", endDate: "2015-01-02", storeId: 1, productId: 1, categoryId: 1],
[id: 4, startDate: "2014-12-24", endDate: "2015-01-08", storeId: 2, productId: 1, categoryId: 1],
[id: 8, startDate: "2014-12-25", endDate: "2015-01-01", storeId: 2, productId: 3, categoryId: 1],
[id: 9, startDate: "2014-12-22", endDate: "2015-01-02", storeId: 1, productId: 3, categoryId: 1],
[id: 10, startDate: "2015-01-10", endDate: "2015-01-21", storeId: 1, productId: 1, categoryId: 1]
];
objective就是得到这样的结果:
产品统计
Product Id: 1 | Availability Index: 15 + 11 + 11 = 37.
Longest Available Products (Sort By Past Start Date *first* then, Store Id):
1. "2014-12-24" to "2015-01-08" in store id 2. (15 days)
2. "2014-12-22" to "2015-01-02" in store id 1. (11 days)
3. "2015-01-10" to "2015-01-21" in store id 1. (11 days)
Product Id: 3 | Availability Index: 7 + 11 = 18.
Longest Available Products (Sort By Past Start Date *first* then, Store Id):
1. "2014-12-22" to "2015-01-02" in store id 1. (11 days)
2. "2014-12-25" to "2015-01-01" in store id 2. (7 days)
店铺统计
Store Id: 1 | Availability Index: 11 + 11 + 11 = 33.
Most Available Product (sort by most available product, then sort by product id):
1. Product Id: 3 on ["2014-12-22" to "2015-01-02"] (11 days)
2. Product Id: 1 on ["2014-12-22" to "2015-01-02", "2015-01-10" to "2015-01-21"] (11 days)
Store Id: 2 | Availability Index: 15 + 7 = 22.
Most Available Product (sort by most available product, then sort by product id):
1. Product Id: 1 on ["2014-12-24" to "2015-01-08"] (15 days)
2. Product Id: 3 on ["2014-12-25" to "2015-01-01"] (7 days)
Total Availability Index: 37 + 18 or 33 + 22 = 55.
这里和上面的打印结果是产品统计和店铺统计。 我会寻求优化、高效且易于理解的解决方案来打印上面的结果。
我尝试从上面的数据中得到的结果:
// productAvailability => see the declaration variable above in the beginning of question!
List aggregateDates = productAvailability.collect({[
storeId: it.storeId,
productId: it.productId,
availabilityIndex: Date.parse("YYYY-MM-dd", it.endDate) - Date.parse("YYYY-MM-dd", it.startDate)
]});
println "Total Availability Index: " + aggregateDates.clone().sum({ it.availabilityIndex });
println "Total Products: " + aggregateDates.clone().unique({ it.productId }).count({ it.productId });
println "Total Stores: " + aggregateDates.clone().unique({ it.storeId }).count({ it.storeId });
println "Average Availability Index: " + aggregateDates.clone().sum({ it.availabilityIndex }) / aggregateDates.size();
正如您在上面的代码片段中看到的,我可以非常轻松地获得 productAvailability[=41= 中有多少 PRODUCT 和 STORE 的汇总 SUM、AVG 和 COUNT ]数据。但是,这对我来说很难获得基于 PRODUCT 和 STORE 的可用性,使用日期范围来实现上述 objective。
请参阅下面使用日期范围的代码。
def dailyDatesAvailability = [:] as Map<Date, Integer>;
def dailyStoresAvailability = [:].withDefault {0} as Map<Integer, Integer>;
def dailyProductsAvailability = [:].withDefault {0} as Map<Integer, Integer>;
(Date.parse("YYYY-MM-dd", "2014-12-01")).upto((Date.parse("YYYY-MM-dd", "2015-01-30"))) { Date runningDate ->
dailyDatesAvailability[runningDate] = 0;
productAvailability.each({ _availability ->
def _startDate = Date.parse("YYYY-MM-dd", _availability.startDate);
def _endDate = Date.parse("YYYY-MM-dd", _availability.endDate);
if (_startDate <= runningDate && _endDate >= runningDate) {
dailyDatesAvailability[runningDate]++;
dailyProductsAvailability[_availability.productId]++;
dailyStoresAvailability[_availability.storeId]++;
}
// Do something here to get the MOST available PRODUCT in a STORE with date ranges
});
/// or do something here....?
}
使用 Groovy 打印上面 的 objective 的最佳方法是什么?请分享代码片段以便进行测试。
对此很感兴趣,并想出了:
List<Range> simplify( List<Range> ranges ) {
ranges.drop( 1 ).inject( ranges.take( 1 ) ) { r, curr ->
// Find an overlapping range
def ov = r.find { curr.from <= it.to && curr.to >= it.from }
if( ov ) {
ov.from = [ curr.from, ov.from ].min()
ov.to = [ curr.to, ov.to ].max()
simplify( r )
}
else {
r << curr
}
}
}
def manipulate(data, primary, secondary) {
data.groupBy { it."$primary" }
.collect { id, vals ->
def joined = vals.collect { it ->
[ id: it.id,
range: Date.parse('yyyy-MM-dd', it.startDate)..Date.parse('yyyy-MM-dd', it.endDate),
key: secondary,
value: it."$secondary" ]
}.groupBy { it.value }
.collectMany { sid, ran -> simplify(ran.range).collect { [key: secondary, value: sid, range:it, days:(it.to - it.from)] } }
.sort { a, b -> b.days <=> a.days ?: a.value - b.value }
[name:primary, id:id, data:joined]
}
}
def dump(data) {
data.collect { a ->
def sum = a.data.days.sum()
println "$a.name: $a.id | availability index ${a.data.days.join(' + ')} = ${sum}"
a.data.eachWithIndex { row, idx ->
println " ${idx+1}. ${row.range.from.format('yyyy-MM-dd')} to ${row.range.to.format('yyyy-MM-dd')} in $row.key $row.value ($row.days days)"
}
sum
}
}
def productAvailability = [
[id: 1, startDate: "2014-12-22", endDate: "2015-01-02", storeId: 1, productId: 1, categoryId: 1],
[id: 4, startDate: "2014-12-24", endDate: "2015-01-08", storeId: 2, productId: 1, categoryId: 1],
[id: 8, startDate: "2014-12-25", endDate: "2015-01-01", storeId: 2, productId: 3, categoryId: 1],
[id: 9, startDate: "2014-12-22", endDate: "2015-01-02", storeId: 1, productId: 3, categoryId: 1],
[id: 10, startDate: "2015-01-10", endDate: "2015-01-21", storeId: 1, productId: 1, categoryId: 1]
];
def p = dump(manipulate(productAvailability, 'productId', 'storeId'))
println ''
def s = dump(manipulate(productAvailability, 'storeId', 'productId'))
println ''
println "Total Availability Index: ${p.join(' + ')} or ${s.join(' + ')} = ${[p.sum(), s.sum()].max()}"
打印出:
productId: 1 | availability index 15 + 11 + 11 = 37
1. 2014-12-24 to 2015-01-08 in storeId 2 (15 days)
2. 2014-12-22 to 2015-01-02 in storeId 1 (11 days)
3. 2015-01-10 to 2015-01-21 in storeId 1 (11 days)
productId: 3 | availability index 11 + 7 = 18
1. 2014-12-22 to 2015-01-02 in storeId 1 (11 days)
2. 2014-12-25 to 2015-01-01 in storeId 2 (7 days)
storeId: 1 | availability index 11 + 11 + 11 = 33
1. 2014-12-22 to 2015-01-02 in productId 1 (11 days)
2. 2015-01-10 to 2015-01-21 in productId 1 (11 days)
3. 2014-12-22 to 2015-01-02 in productId 3 (11 days)
storeId: 2 | availability index 15 + 7 = 22
1. 2014-12-24 to 2015-01-08 in productId 1 (15 days)
2. 2014-12-25 to 2015-01-01 in productId 3 (7 days)
Total Availability Index: 37 + 18 or 33 + 22 = 55