Go SDK Apache Beam:单例侧输入 Singleton for int ill-defined
Go SDK Apache Beam : singleton side input Singleton for int ill-defined
使用适用于 Apache Beam 的 Go SDK,我正在尝试使用辅助输入创建 PCollection 的视图。
但是我遇到了这个奇怪的错误:
Failed to execute job: on ctx= making side input 0:
singleton side input Singleton for int ill-defined
exit status 1
这里是我使用的代码:
// A PCollection of key/value pairs
pairedWithOne := beam.ParDo(s, func(r models.Review) (string, int) {
return r.DoRecommend, 1
}, col)
// A PCollection of ints (demo)
pcollInts := beam.CreateList(s, [3]int{
1, 2, 3,
})
// A PCollection of key/values pairs
summed := stats.SumPerKey(s, pairedWithOne)
// Here is where I'd like to use my side input.
mapped := beam.ParDo(s, func(k string, v int, side int, emit func(ratio
models.RecommendRatio)) {
var ratio = models.RecommendRatio{
DoRecommend: k,
NumVotes: v,
}
emit(ratio)
}, summed, beam.SideInput{Input: pcollInts})
我在 git 上找到了这个例子:
// Side Inputs
//
// While a ParDo processes elements from a single "main input" PCollection, it
// can take additional "side input" PCollections. These SideInput along with
// the DoFn parameter form express styles of accessing PCollection computed by
// earlier pipeline operations, passed in to the ParDo transform using SideInput
// options, and their contents accessible to each of the DoFn operations. For
// example:
//
// words := ...
// cufoff := ... // Singleton PCollection<int>
// smallWords := beam.ParDo(s, func (word string, cutoff int, emit func(string)) {
// if len(word) < cutoff {
// emit(word)
// }
// }, words, beam.SideInput{Input: cutoff})
更新: 似乎 Impulse(scope)
函数在这里起作用,但我不知道是什么。来自 GoDoc:
Impulse emits a single empty []byte into the global window. The resulting PCollection is a singleton of type []byte.
The purpose of Impulse is to trigger another transform, such as ones that take all information as side inputs.
如果这可以帮助到我的结构:
type Review struct {
Date time.Time `csv:"date" json:"date"`
DoRecommend string `csv:"doRecommend" json:"doRecommend"`
NumHelpful int `csv:"numHelpful" json:"numHelpful"`
Rating int `csv:"rating" json:"rating"`
Text string `csv:"text" json:"text"`
Title string `csv:"title" json:"title"`
Username string `csv:"username" json:"username"`
}
type RecommendRatio struct {
DoRecommend string `json:"doRecommend"`
NumVotes int `json:"numVotes"`
}
有什么解决办法吗?
谢谢
更新:
这可以通过删除 beam.Impulse()
函数来简化(我认为错误的类型导致了这里的麻烦):
mapped := beam.ParDo(s,
func(k string, v int,
sideCounted int,
emit func(ratio models.RecommendRatio)) {
p := percent.PercentOf(v, sideCounted)
emit(models.RecommendRatio{
DoRecommend: k,
NumVotes: v,
Percent: p,
})
}, summed,
beam.SideInput{Input: counted})
旧:
似乎我找到了解决方案,也许只是一个解决方法,正在寻求快速审查并开放改进空间。 (我认为该函数不是幂等的,因为如果它可能在多个节点工作程序上执行多次,则 append() 函数将重复条目...)
但这里的全局想法是使用 beam.Impulse(scope)
函数创建 []uint8 byte
的单例 PCollection,并将所有 "real" 数据作为侧输入传递。
// Pair each recommendation value with one -> PColl<KV<string, int>>
pairedWithOne := beam.ParDo(s, func(r models.Review) (string, int) {
return r.DoRecommend, 1
}, col)
// Sum num occurrences of a recommendation k/v pair
summed := stats.SumPerKey(s, pairedWithOne)
// Drop keys for latter global count
droppedKey := beam.DropKey(s, pairedWithOne)
// Count globally the number of recommendation values -> PColl<int>
counted := stats.Sum(s, droppedKey)
// Map to a struct with percentage per ratio
mapped := beam.ParDo(s,
func(_ []uint8,
sideSummed func(k *string, v *int) bool,
sideCounted int,
emit func(ratio []models.RecommendRatio)) {
var k string
var v int
var ratios []models.RecommendRatio
for sideSummed(&k, &v) {
p := percent.PercentOf(v, sideCounted)
ratio := models.RecommendRatio{
DoRecommend: k,
NumVotes: v,
Percent: p,
}
ratios = append(ratios, ratio)
}
emit(ratios)
}, beam.Impulse(s),
beam.SideInput{Input: summed},
beam.SideInput{Input: counted})
使用适用于 Apache Beam 的 Go SDK,我正在尝试使用辅助输入创建 PCollection 的视图。
但是我遇到了这个奇怪的错误:
Failed to execute job: on ctx= making side input 0:
singleton side input Singleton for int ill-defined
exit status 1
这里是我使用的代码:
// A PCollection of key/value pairs
pairedWithOne := beam.ParDo(s, func(r models.Review) (string, int) {
return r.DoRecommend, 1
}, col)
// A PCollection of ints (demo)
pcollInts := beam.CreateList(s, [3]int{
1, 2, 3,
})
// A PCollection of key/values pairs
summed := stats.SumPerKey(s, pairedWithOne)
// Here is where I'd like to use my side input.
mapped := beam.ParDo(s, func(k string, v int, side int, emit func(ratio
models.RecommendRatio)) {
var ratio = models.RecommendRatio{
DoRecommend: k,
NumVotes: v,
}
emit(ratio)
}, summed, beam.SideInput{Input: pcollInts})
我在 git 上找到了这个例子:
// Side Inputs
//
// While a ParDo processes elements from a single "main input" PCollection, it
// can take additional "side input" PCollections. These SideInput along with
// the DoFn parameter form express styles of accessing PCollection computed by
// earlier pipeline operations, passed in to the ParDo transform using SideInput
// options, and their contents accessible to each of the DoFn operations. For
// example:
//
// words := ...
// cufoff := ... // Singleton PCollection<int>
// smallWords := beam.ParDo(s, func (word string, cutoff int, emit func(string)) {
// if len(word) < cutoff {
// emit(word)
// }
// }, words, beam.SideInput{Input: cutoff})
更新: 似乎 Impulse(scope)
函数在这里起作用,但我不知道是什么。来自 GoDoc:
Impulse emits a single empty []byte into the global window. The resulting PCollection is a singleton of type []byte.
The purpose of Impulse is to trigger another transform, such as ones that take all information as side inputs.
如果这可以帮助到我的结构:
type Review struct {
Date time.Time `csv:"date" json:"date"`
DoRecommend string `csv:"doRecommend" json:"doRecommend"`
NumHelpful int `csv:"numHelpful" json:"numHelpful"`
Rating int `csv:"rating" json:"rating"`
Text string `csv:"text" json:"text"`
Title string `csv:"title" json:"title"`
Username string `csv:"username" json:"username"`
}
type RecommendRatio struct {
DoRecommend string `json:"doRecommend"`
NumVotes int `json:"numVotes"`
}
有什么解决办法吗?
谢谢
更新:
这可以通过删除 beam.Impulse()
函数来简化(我认为错误的类型导致了这里的麻烦):
mapped := beam.ParDo(s,
func(k string, v int,
sideCounted int,
emit func(ratio models.RecommendRatio)) {
p := percent.PercentOf(v, sideCounted)
emit(models.RecommendRatio{
DoRecommend: k,
NumVotes: v,
Percent: p,
})
}, summed,
beam.SideInput{Input: counted})
旧: 似乎我找到了解决方案,也许只是一个解决方法,正在寻求快速审查并开放改进空间。 (我认为该函数不是幂等的,因为如果它可能在多个节点工作程序上执行多次,则 append() 函数将重复条目...)
但这里的全局想法是使用 beam.Impulse(scope)
函数创建 []uint8 byte
的单例 PCollection,并将所有 "real" 数据作为侧输入传递。
// Pair each recommendation value with one -> PColl<KV<string, int>>
pairedWithOne := beam.ParDo(s, func(r models.Review) (string, int) {
return r.DoRecommend, 1
}, col)
// Sum num occurrences of a recommendation k/v pair
summed := stats.SumPerKey(s, pairedWithOne)
// Drop keys for latter global count
droppedKey := beam.DropKey(s, pairedWithOne)
// Count globally the number of recommendation values -> PColl<int>
counted := stats.Sum(s, droppedKey)
// Map to a struct with percentage per ratio
mapped := beam.ParDo(s,
func(_ []uint8,
sideSummed func(k *string, v *int) bool,
sideCounted int,
emit func(ratio []models.RecommendRatio)) {
var k string
var v int
var ratios []models.RecommendRatio
for sideSummed(&k, &v) {
p := percent.PercentOf(v, sideCounted)
ratio := models.RecommendRatio{
DoRecommend: k,
NumVotes: v,
Percent: p,
}
ratios = append(ratios, ratio)
}
emit(ratios)
}, beam.Impulse(s),
beam.SideInput{Input: summed},
beam.SideInput{Input: counted})