来自 github 的 snakemake 模块改变了目标?
snakemake module from github changes targets?
希望您能帮我解决问题或让我提交报告
我是 'importing' 来自 github 的另一个 snakefile 中的 snakemake 模块,它是本地的。这似乎弄乱了本地 snakefile 的目标。当导入第二个 snakefile 时,目标不再是规则 'all' 指定的目标,而是导入的 snakefile 中的一些任意(?)规则,即使导入的 snakefile 不包含任何相关规则。
我在 github 上编译了一个示例集,其中包含两个存储库,它们都遇到了这个问题(lpagie/repo1 和 lpagie/repo2)。来自 repo1/readme.md:
==============
这个 repo 是为了说明使用 snakemake 模块的问题(?)
来自 github
在本地克隆此 repo 并从克隆的上方目录 运行 snakemake
回购,使用包装 run.sh
这个 snakefile 将 'import' lpagie/repo2,在当前形式中只包含
注释掉的规则和一条(据推测)对 repo1 没有意义的规则。
运行 repo1 的 snakemake 不会生成规则指定的输出
'all' (output/final
) 而是由规则 'non-sense' ....
生成的输出
当 repo2 模块的导入被 repo1/snakefile_1.smk 取消注释时,
运行使 snakemake 产生预期的结果。
=============
我是不是忽略了一些明显的东西?
我正在使用安装在 conda 中的 snakemake V 6.9.1
这是我 运行 全新安装 repo1 和 运行 宁 'repo1/run.sh':
的输出
git clone git@github.com:lpagie/repo1.git
git clone git@github.com:lpagie/repo2.git
bash repo1/run.sh
repo_dir = /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/repo1
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job count min threads max threads
------------ ------- ------------- -------------
wf2_nonsense 1 1 1
total 1 1 1
Select jobs to execute...
[Wed Oct 6 17:04:44 2021]
rule wf2_nonsense:
input: /tmp/tmph_6w4l9asnakemake-runtime-source-cache/bfcfa05f3052febb0b88b59991e4aac562b3465cfdb8f8d288a357884ae7572b
output: output/nonsense.out
jobid: 0
reason: Missing output files: output/nonsense.out
resources: tmpdir=/tmp
/data/home/ludo/miniconda3/bin/python3.8 /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/.snakemake/scripts/tmpu8huybi8.touch.py
repo2
/data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos
[Wed Oct 6 17:04:45 2021]
Finished job 0.
1 of 1 steps (100%) done
Complete log: /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/.snakemake/log/2021-10-06T170442.797027.snakemake.log
注释掉导入 repos2 模块的行后相同:
vi repo1/snakefile_1.smk
bash repo1/run.sh
repo_dir = /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/repo1
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job count min threads max threads
----- ------- ------------- -------------
A 1 1 1
B 1 1 1
all 1 1 1
total 3 1 1
Select jobs to execute...
[Wed Oct 6 17:08:18 2021]
rule B:
output: output/fB
jobid: 2
reason: Missing output files: output/fB
resources: tmpdir=/tmp
bash /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/repo1/scripts/touch.sh output/fB
[Wed Oct 6 17:08:18 2021]
Finished job 2.
1 of 3 steps (33%) done
Select jobs to execute...
[Wed Oct 6 17:08:18 2021]
rule A:
input: output/fB
output: output/final
jobid: 1
reason: Missing output files: output/final; Input files updated by another job: output/fB
resources: tmpdir=/tmp
bash /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/repo1/scripts/touch.sh output/final
[Wed Oct 6 17:08:18 2021]
Finished job 1.
2 of 3 steps (67%) done
Select jobs to execute...
[Wed Oct 6 17:08:18 2021]
localrule all:
input: output/final
jobid: 0
reason: Input files updated by another job: output/final
resources: tmpdir=/tmp
[Wed Oct 6 17:08:18 2021]
Finished job 0.
3 of 3 steps (100%) done
Complete log: /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/.snakemake/log/2021-10-06T170818.178572.snakemake.l
og
我创建了 lpagie/repo3,它是 repo1 的副本,但注释掉了导入 repo2 模块的注释行。
您从远程模块导入规则的代码位于之前 rule all
。因此,首先导入的规则决定了流水线的最终输出。
所以只需将导入放在rule all
之后。而不是这个:
module other_workflow:
snakefile: github("lpagie/repo2", path="snakefile_2.smk", commit="61f60f7")
config: config
use rule * from other_workflow as wf2_*
rule all:
input:
"output/final"
尝试:
module other_workflow:
snakefile: github("lpagie/repo2", path="snakefile_2.smk", commit="61f60f7")
config: config
rule all:
input:
"output/final"
use rule * from other_workflow as wf2_*
(顺便说一句,github
函数似乎是最近添加的,这将适用于 snakemake >=6.9)
希望您能帮我解决问题或让我提交报告
我是 'importing' 来自 github 的另一个 snakefile 中的 snakemake 模块,它是本地的。这似乎弄乱了本地 snakefile 的目标。当导入第二个 snakefile 时,目标不再是规则 'all' 指定的目标,而是导入的 snakefile 中的一些任意(?)规则,即使导入的 snakefile 不包含任何相关规则。
我在 github 上编译了一个示例集,其中包含两个存储库,它们都遇到了这个问题(lpagie/repo1 和 lpagie/repo2)。来自 repo1/readme.md:
==============
这个 repo 是为了说明使用 snakemake 模块的问题(?) 来自 github
在本地克隆此 repo 并从克隆的上方目录 运行 snakemake
回购,使用包装 run.sh
这个 snakefile 将 'import' lpagie/repo2,在当前形式中只包含
注释掉的规则和一条(据推测)对 repo1 没有意义的规则。
运行 repo1 的 snakemake 不会生成规则指定的输出
'all' (output/final
) 而是由规则 'non-sense' ....
当 repo2 模块的导入被 repo1/snakefile_1.smk 取消注释时, 运行使 snakemake 产生预期的结果。
=============
我是不是忽略了一些明显的东西?
我正在使用安装在 conda 中的 snakemake V 6.9.1
这是我 运行 全新安装 repo1 和 运行 宁 'repo1/run.sh':
git clone git@github.com:lpagie/repo1.git
git clone git@github.com:lpagie/repo2.git
bash repo1/run.sh
repo_dir = /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/repo1
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job count min threads max threads
------------ ------- ------------- -------------
wf2_nonsense 1 1 1
total 1 1 1
Select jobs to execute...
[Wed Oct 6 17:04:44 2021]
rule wf2_nonsense:
input: /tmp/tmph_6w4l9asnakemake-runtime-source-cache/bfcfa05f3052febb0b88b59991e4aac562b3465cfdb8f8d288a357884ae7572b
output: output/nonsense.out
jobid: 0
reason: Missing output files: output/nonsense.out
resources: tmpdir=/tmp
/data/home/ludo/miniconda3/bin/python3.8 /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/.snakemake/scripts/tmpu8huybi8.touch.py
repo2
/data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos
[Wed Oct 6 17:04:45 2021]
Finished job 0.
1 of 1 steps (100%) done
Complete log: /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/.snakemake/log/2021-10-06T170442.797027.snakemake.log
注释掉导入 repos2 模块的行后相同:
vi repo1/snakefile_1.smk
bash repo1/run.sh
repo_dir = /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/repo1
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job count min threads max threads
----- ------- ------------- -------------
A 1 1 1
B 1 1 1
all 1 1 1
total 3 1 1
Select jobs to execute...
[Wed Oct 6 17:08:18 2021]
rule B:
output: output/fB
jobid: 2
reason: Missing output files: output/fB
resources: tmpdir=/tmp
bash /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/repo1/scripts/touch.sh output/fB
[Wed Oct 6 17:08:18 2021]
Finished job 2.
1 of 3 steps (33%) done
Select jobs to execute...
[Wed Oct 6 17:08:18 2021]
rule A:
input: output/fB
output: output/final
jobid: 1
reason: Missing output files: output/final; Input files updated by another job: output/fB
resources: tmpdir=/tmp
bash /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/repo1/scripts/touch.sh output/final
[Wed Oct 6 17:08:18 2021]
Finished job 1.
2 of 3 steps (67%) done
Select jobs to execute...
[Wed Oct 6 17:08:18 2021]
localrule all:
input: output/final
jobid: 0
reason: Input files updated by another job: output/final
resources: tmpdir=/tmp
[Wed Oct 6 17:08:18 2021]
Finished job 0.
3 of 3 steps (100%) done
Complete log: /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/.snakemake/log/2021-10-06T170818.178572.snakemake.l
og
我创建了 lpagie/repo3,它是 repo1 的副本,但注释掉了导入 repo2 模块的注释行。
您从远程模块导入规则的代码位于之前 rule all
。因此,首先导入的规则决定了流水线的最终输出。
所以只需将导入放在rule all
之后。而不是这个:
module other_workflow:
snakefile: github("lpagie/repo2", path="snakefile_2.smk", commit="61f60f7")
config: config
use rule * from other_workflow as wf2_*
rule all:
input:
"output/final"
尝试:
module other_workflow:
snakefile: github("lpagie/repo2", path="snakefile_2.smk", commit="61f60f7")
config: config
rule all:
input:
"output/final"
use rule * from other_workflow as wf2_*
(顺便说一句,github
函数似乎是最近添加的,这将适用于 snakemake >=6.9)