使用同一组键过滤两个 py2store 商店

Filtering two py2store stores with the same set of keys

在下面的代码中,基于我发现使用 py2store 的示例,我使用 with_key_filt 制作了两个 dacc(一个带有训练数据,另一个带有测试数据)。我确实得到了过滤后的 annots 商店,但 wfs 商店没有被过滤。 我做错了什么?

from py2store import cached_keys

class Dacc:
    """Waveform and annotation data access"""
    def __init__(self, wfs, annots, annot_to_tag=lambda x: x['tag']):
        self.wfs = wfs  # waveform store  (keys: filepaths, values: numpy arrays)
        self.annots = annots  # annotation store (keys: filepaths, values: dicts or pandas series)
        self.annot_to_tag = annot_to_tag  # function to compute a tag from an annotation item

    @classmethod
    def with_key_filt(cls, key_filt, wfs, annots, annot_to_tag, chunker):
        """
        Make an instance of the dacc class where the data is filtered out.
        You could also filter out externaly, but this can be convenient
        """
        filtered_annots = cached_keys(annots, keys_cache=key_filt)
        return cls(wfs, filtered_annots, annot_to_tag)

    def wf_tag_gen(self):
        """Generator of (wf, tag) tuples"""
        for k in self.annots:
            try:
                wf = self.wfs[k]
                annot = self.annots[k]
                yield wf, self.annot_to_tag(annot)
            except KeyError:
                pass

with_key_filt 的目的似乎是过滤 annots,它本身被用作 wg_tag_gen 生成器的种子(可能还有您没有使用的其他生成器) t post)。因此,它确实过滤了所有内容。

但我同意您的期望,即 wfs 也应该被过滤。为此,您只需添加一行来过滤 wfs.

class TheDaccYouWant(Dacc):
    @classmethod
    def with_key_filt(cls, key_filt, wfs, annots, annot_to_tag, chunker):
        filtered_annots = cached_keys(annots, keys_cache=key_filt)
        wfs = cached_keys(wfs, keys_cache=key_filt)  # here's what was added
        return cls(wfs, filtered_annots, annot_to_tag, chunker)