如何在关键字 wildcard_constraints 中使用通配符

Question

例如，我有以下通配符。

dataset = ['A1', 'A2', 'A3', 'B1', 'B2', 'B3']
group = ['A', 'B']

我正在尝试将我的数据集限制在我的小组中。例如，我想创建

A1/file.A.txt A2/file.A.txt A3/file.A.txt B1/file.B.txt 。 ..

我写了以下规则希望能够实现

rule complex_conversion:
    input:
        "{dataset}/inputfile"
    output:
        "{dataset}/file.{group}.txt"
    wildcard_constraints:
        dataset = {group} + '\d+'
        #dataset = {wildcards.group} + '\d+'
    shell:
        "somecommand --group {wildcards.group}  < {input}  > {output}"

糟糕，我收到错误

TypeError:unhashable type: 'list'
#NameError: name 'wildcards' is not defined

好像把{group}当作一个列表来传入关键字wildcard_constraints。

是否有任何方法可以在 wildcards_constrain 中使用通配符或将数据集映射到组的替代方法。

Answer 1

这没有回答您的问题，但也许有帮助...如果您的输出文件列表是 dataset 和 group 的组合，我会先创建该列表，然后使用它作为输出文件列表：

dataset = ['A1', 'A2', 'A3', 'B1', 'B2', 'B3']
group = ['A', 'B']

# Use a for-loop or whatever to create this list:
datagrp = ['A1/file.A.txt','A2/file.A.txt', 'A3/file.A.txt', 'B1/file.B.txt']

wildcard_constraints:
    # This prevents wildcards to be interpreted as regexes
    dataset = '|'.join([x for x in dataset]),
    group = '|'.join([x for x in group])

rule all:
    input:
        datagrp,

rule complex_conversion:
    input:
        "{dataset}/inputfile"
    output:
        "{dataset}/file.{group}.txt"
    shell:
        "somecommand --group {wildcards.group}  < {input}  > {output}"

如何在关键字 wildcard_constraints 中使用通配符

How to use wildcards in keyword wildcard_constraints

wildcard

snakemake