来自 github 的 snakemake 模块改变了目标？

Question

希望您能帮我解决问题或让我提交报告

我是 'importing' 来自 github 的另一个 snakefile 中的 snakemake 模块，它是本地的。这似乎弄乱了本地 snakefile 的目标。当导入第二个 snakefile 时，目标不再是规则 'all' 指定的目标，而是导入的 snakefile 中的一些任意（？）规则，即使导入的 snakefile 不包含任何相关规则。

我在 github 上编译了一个示例集，其中包含两个存储库，它们都遇到了这个问题（lpagie/repo1 和 lpagie/repo2）。来自 repo1/readme.md:

==============

这个 repo 是为了说明使用 snakemake 模块的问题（？）来自 github

在本地克隆此 repo 并从克隆的上方目录运行 snakemake 回购，使用包装 run.sh

这个 snakefile 将 'import' lpagie/repo2，在当前形式中只包含注释掉的规则和一条（据推测）对 repo1 没有意义的规则。
运行 repo1 的 snakemake 不会生成规则指定的输出 'all' (output/final) 而是由规则 'non-sense' ....

生成的输出

当 repo2 模块的导入被 repo1/snakefile_1.smk 取消注释时，运行使 snakemake 产生预期的结果。

=============

我是不是忽略了一些明显的东西？

我正在使用安装在 conda 中的 snakemake V 6.9.1
这是我运行全新安装 repo1 和运行宁 'repo1/run.sh':

的输出

git clone git@github.com:lpagie/repo1.git
git clone git@github.com:lpagie/repo2.git

bash repo1/run.sh 
repo_dir = /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/repo1
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job             count    min threads    max threads
------------  -------  -------------  -------------
wf2_nonsense        1              1              1
total               1              1              1

Select jobs to execute...

[Wed Oct  6 17:04:44 2021]
rule wf2_nonsense:
    input: /tmp/tmph_6w4l9asnakemake-runtime-source-cache/bfcfa05f3052febb0b88b59991e4aac562b3465cfdb8f8d288a357884ae7572b
    output: output/nonsense.out
    jobid: 0
    reason: Missing output files: output/nonsense.out
    resources: tmpdir=/tmp

/data/home/ludo/miniconda3/bin/python3.8 /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/.snakemake/scripts/tmpu8huybi8.touch.py
repo2
/data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos
[Wed Oct  6 17:04:45 2021]
Finished job 0.
1 of 1 steps (100%) done
Complete log: /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/.snakemake/log/2021-10-06T170442.797027.snakemake.log

注释掉导入 repos2 模块的行后相同：

vi repo1/snakefile_1.smk

bash repo1/run.sh 
repo_dir = /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/repo1
Building DAG of jobs...                                                                                                                 
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.                                                                                        
Job stats:                
job      count    min threads    max threads
-----  -------  -------------  -------------
A            1              1              1
B            1              1              1
all          1              1              1
total        3              1              1
                                  
Select jobs to execute...
                                                                    
[Wed Oct  6 17:08:18 2021]
rule B:
    output: output/fB     
    jobid: 2   
    reason: Missing output files: output/fB
    resources: tmpdir=/tmp                                                                                                                                                
bash /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/repo1/scripts/touch.sh output/fB
[Wed Oct  6 17:08:18 2021]
Finished job 2.
1 of 3 steps (33%) done
Select jobs to execute...

[Wed Oct  6 17:08:18 2021]
rule A:
    input: output/fB
    output: output/final
    jobid: 1
    reason: Missing output files: output/final; Input files updated by another job: output/fB
    resources: tmpdir=/tmp

bash /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/repo1/scripts/touch.sh output/final
[Wed Oct  6 17:08:18 2021]
Finished job 1.
2 of 3 steps (67%) done
Select jobs to execute...

[Wed Oct  6 17:08:18 2021]
localrule all:
    input: output/final
    jobid: 0
    reason: Input files updated by another job: output/final
    resources: tmpdir=/tmp

[Wed Oct  6 17:08:18 2021]
Finished job 0.
3 of 3 steps (100%) done
Complete log: /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/.snakemake/log/2021-10-06T170818.178572.snakemake.l
og

我创建了 lpagie/repo3，它是 repo1 的副本，但注释掉了导入 repo2 模块的注释行。

Answer 1

您从远程模块导入规则的代码位于之前 rule all。因此，首先导入的规则决定了流水线的最终输出。

所以只需将导入放在rule all之后。而不是这个：

module other_workflow:
  snakefile: github("lpagie/repo2", path="snakefile_2.smk", commit="61f60f7")
  config: config

use rule * from other_workflow as wf2_*

rule all:
  input:
    "output/final"

尝试：

module other_workflow:
  snakefile: github("lpagie/repo2", path="snakefile_2.smk", commit="61f60f7")
  config: config

rule all:
  input:
    "output/final"

use rule * from other_workflow as wf2_*

（顺便说一句，github 函数似乎是最近添加的，这将适用于 snakemake >=6.9）

来自 github 的 snakemake 模块改变了目标？

snakemake module from github changes targets?

module

github

snakemake