来自 github 的 snakemake 模块改变了目标?

snakemake module from github changes targets?

希望您能帮我解决问题或让我提交报告

我是 'importing' 来自 github 的另一个 snakefile 中的 snakemake 模块,它是本地的。这似乎弄乱了本地 snakefile 的目标。当导入第二个 snakefile 时,目标不再是规则 'all' 指定的目标,而是导入的 snakefile 中的一些任意(?)规则,即使导入的 snakefile 不包含任何相关规则。

我在 github 上编译了一个示例集,其中包含两个存储库,它们都遇到了这个问题(lpagie/repo1 和 lpagie/repo2)。来自 repo1/readme.md:

==============

这个 repo 是为了说明使用 snakemake 模块的问题(?) 来自 github

在本地克隆此 repo 并从克隆的上方目录 运行 snakemake 回购,使用包装 run.sh

这个 snakefile 将 'import' lpagie/repo2,在当前形式中只包含 注释掉的规则和一条(据推测)对 repo1 没有意义的规则。
运行 repo1 的 snakemake 不会生成规则指定的输出 'all' (output/final) 而是由规则 'non-sense' ....

生成的输出

当 repo2 模块的导入被 repo1/snakefile_1.smk 取消注释时, 运行使 snakemake 产生预期的结果。

=============

我是不是忽略了一些明显的东西?

我正在使用安装在 conda 中的 snakemake V 6.9.1
这是我 运行 全新安装 repo1 和 运行 宁 'repo1/run.sh':

的输出
git clone git@github.com:lpagie/repo1.git
git clone git@github.com:lpagie/repo2.git

bash repo1/run.sh 
repo_dir = /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/repo1
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job             count    min threads    max threads
------------  -------  -------------  -------------
wf2_nonsense        1              1              1
total               1              1              1

Select jobs to execute...

[Wed Oct  6 17:04:44 2021]
rule wf2_nonsense:
    input: /tmp/tmph_6w4l9asnakemake-runtime-source-cache/bfcfa05f3052febb0b88b59991e4aac562b3465cfdb8f8d288a357884ae7572b
    output: output/nonsense.out
    jobid: 0
    reason: Missing output files: output/nonsense.out
    resources: tmpdir=/tmp

/data/home/ludo/miniconda3/bin/python3.8 /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/.snakemake/scripts/tmpu8huybi8.touch.py
repo2
/data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos
[Wed Oct  6 17:04:45 2021]
Finished job 0.
1 of 1 steps (100%) done
Complete log: /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/.snakemake/log/2021-10-06T170442.797027.snakemake.log

注释掉导入 repos2 模块的行后相同:

vi repo1/snakefile_1.smk

bash repo1/run.sh 
repo_dir = /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/repo1
Building DAG of jobs...                                                                                                                 
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.                                                                                        
Job stats:                
job      count    min threads    max threads
-----  -------  -------------  -------------
A            1              1              1
B            1              1              1
all          1              1              1
total        3              1              1
                                  
Select jobs to execute...
                                                                    
[Wed Oct  6 17:08:18 2021]
rule B:
    output: output/fB     
    jobid: 2   
    reason: Missing output files: output/fB
    resources: tmpdir=/tmp                                                                                                                                                
bash /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/repo1/scripts/touch.sh output/fB
[Wed Oct  6 17:08:18 2021]
Finished job 2.
1 of 3 steps (33%) done
Select jobs to execute...

[Wed Oct  6 17:08:18 2021]
rule A:
    input: output/fB
    output: output/final
    jobid: 1
    reason: Missing output files: output/final; Input files updated by another job: output/fB
    resources: tmpdir=/tmp

bash /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/repo1/scripts/touch.sh output/final
[Wed Oct  6 17:08:18 2021]
Finished job 1.
2 of 3 steps (67%) done
Select jobs to execute...

[Wed Oct  6 17:08:18 2021]
localrule all:
    input: output/final
    jobid: 0
    reason: Input files updated by another job: output/final
    resources: tmpdir=/tmp

[Wed Oct  6 17:08:18 2021]
Finished job 0.
3 of 3 steps (100%) done
Complete log: /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/.snakemake/log/2021-10-06T170818.178572.snakemake.l
og

我创建了 lpagie/repo3,它是 repo1 的副本,但注释掉了导入 repo2 模块的注释行。

您从远程模块导入规则的代码位于之前 rule all。因此,首先导入的规则决定了流水线的最终输出。

所以只需将导入放在rule all之后。而不是这个:

module other_workflow:
  snakefile: github("lpagie/repo2", path="snakefile_2.smk", commit="61f60f7")
  config: config

use rule * from other_workflow as wf2_*

rule all:
  input:
    "output/final"

尝试:

module other_workflow:
  snakefile: github("lpagie/repo2", path="snakefile_2.smk", commit="61f60f7")
  config: config

rule all:
  input:
    "output/final"

use rule * from other_workflow as wf2_*

(顺便说一句,github 函数似乎是最近添加的,这将适用于 snakemake >=6.9)