Bash：从动态 HTML 页面下载 .zip

Question

我创建了一个丑陋的单行代码，但我想让它更简单，更容易让其他人阅读。它被用在一个 dockerfile 中，用作脚本来构建一个运行和 Docker.

的图像

curl -s -L http://www.nxfilter.org/|grep Download|sed -e 's/<a /\n<a /g'|;
sed -e 's/<a .*href=['"'"'"]//' -e 's/["'"'"'].*$//' -e '/^$/ d'|;
xargs -n1 curl -s -L|grep zip|sed -e 's/<a /\n<a /g'|;
sed -e 's/<a .*href=['"'"'"]//' -e 's/["'"'"'].*$//' -e '/^$/ d'|;
grep -v dropbox|grep -v logon|grep -v cloud|grep zip

或没有手动换行符

curl -s -L http://www.nxfilter.org/|grep Download|sed -e 's/<a /\n<a /g'|sed -e 's/<a .*href=['"'"'"]//' -e 's/["'"'"'].*$//' -e '/^$/ d'|xargs -n1 curl -s -L|grep zip|sed -e 's/<a /\n<a /g'|sed -e 's/<a .*href=['"'"'"]//' -e 's/["'"'"'].*$//' -e '/^$/ d'|grep -v dropbox|grep -v logon|grep -v cloud|grep zip

第 1 步：访问 nxfilter.org 并按照重定向获取 www.nxfilter.org/p2/index.html
第 2 步：将主页 HTML 解析为 URL 下载页面 www.nxfilter.org/p2/?page_id=93（这是一个博客类型的站点，页面将来可能会更改）
第 3 步：将 URL 的下载页面 HTML 解析为当前 http://nxfilter.org/download/nxfilter-3.0.5.zip
的 nxfilter*.zip 第 4 步：下载为 nxfilter.zip
第 5 步：Docker 文件继续执行命令以设置 NxFilter 将在最终 Docker 容器中运行的环境。

当然有一种更简单的方法来获得 URL 的 .zip

Easiest way to extract the urls from an html page using sed or awk only

RegEx match open tags except XHTML self-contained tags

http://www.unix.com/unix-for-dummies-questions-and-answers/142627-cut-field-line-having-quotes-delimiter.html

wget or curl from stdin

Answer 1

看起来答案是为 URL 解析 downloads.php 页面：

curl -sL nxfilter.org/download.php | grep nxfilter |;
tail -n1|sed -e 's/<a .*href=['"'"'"]//' -e 's/["'"'"'].*$//' -e '/^$/ d'|tr -d '[:blank:]'

它仍然很丑陋，但比我原来的命令字符串短得多。

Bash：从动态 HTML 页面下载 .zip

Bash: Download a .zip from a dynamic HTML page

bash

dockerfile