xargs wget 使用参数从 URL 中提取文件名
xargs wget extract filename from URL with Parameter
我想进行并行下载,但问题是 wget 输出的文件名不正确。
url.txt
http://example.com/file1.zip?arg=tereef&arg2=okook
http://example.com/file2.zip?arg=tereef&arg2=okook
命令
xargs -P 4 -n 1 wget <url.txt
输出文件名
file1.zip?arg=tereef&arg2=okook
file2.zip?arg=tereef&arg2=okook
预期输出
file1.zip
file2.zip
我是 bash 的新手,请建议我如何输出正确的文件名,请不要建议 for
循环或 &
因为它会阻塞。
谢谢
处理您的输入以生成所需的命令,然后 运行 通过 xargs。
perl -ne
- 遍历输入文件的行并执行内联程序
-e : Execute perl one-liner
-n : Loop over all input lines, assigning each to $_ in turn.
xargs -P 4 -n 1 -i -t wget "{}"
-P 4 : Max of 4 Processes at a time
-n 1 : Consume one input line at a time
-i : Use the replace string "{}"
-t : Print the command before executing it
perl -ne '
chomp(my ($url) = $_); # Remove trailing newline
my ($name) = $url =~ m|example.com/(.+)\?|; # Grab the filename
print "$url -O $name\n"; # Print all of the wget params
' url.txt | xargs -P 4 -n 1 -i -t wget "{}"
输出
wget http://example.com/file1.zip?arg=tereef&arg2=okook -O file1.zip
wget http://example.com/file2.zip?arg=tereef&arg2=okook -O file2.zip
--2016-07-21 22:24:44-- http://example.com/file2.zip?arg=tereef&arg2=okook%20-O%20file2.zip
--2016-07-21 22:24:44-- http://example.com/file1.zip?arg=tereef&arg2=okook%20-O%20file1.zip
Resolving example.com (example.com)... Resolving example.com (example.com)... 93.184.216.34, 2606:2800:220:1:248:1893:25c8:1946
93.184.216.34, Connecting to example.com (example.com)|93.184.216.34|:80... 2606:2800:220:1:248:1893:25c8:1946
Connecting to example.com (example.com)|93.184.216.34|:80... connected.
connected.
HTTP request sent, awaiting response... HTTP request sent, awaiting response... 404 Not Found
2016-07-21 22:24:44 ERROR 404: Not Found.
404 Not Found
2016-07-21 22:24:44 ERROR 404: Not Found.
您可以使用必须导出的 bash 函数才能在当前 shell
之外看到
function mywget()
{
wget -O ${1%%\?*} "''"
}
export -f mywget
xargs -P 4 -n 1 -I {} bash -c "mywget '{}'" < url.txt
GNU Parallel 看起来像这样:
parallel -P 4 wget -O '{= s/\?.*//;s:.*/:: =}' {} <url.txt
我想进行并行下载,但问题是 wget 输出的文件名不正确。
url.txt
http://example.com/file1.zip?arg=tereef&arg2=okook
http://example.com/file2.zip?arg=tereef&arg2=okook
命令
xargs -P 4 -n 1 wget <url.txt
输出文件名
file1.zip?arg=tereef&arg2=okook
file2.zip?arg=tereef&arg2=okook
预期输出
file1.zip
file2.zip
我是 bash 的新手,请建议我如何输出正确的文件名,请不要建议 for
循环或 &
因为它会阻塞。
谢谢
处理您的输入以生成所需的命令,然后 运行 通过 xargs。
perl -ne
- 遍历输入文件的行并执行内联程序
-e : Execute perl one-liner
-n : Loop over all input lines, assigning each to $_ in turn.
xargs -P 4 -n 1 -i -t wget "{}"
-P 4 : Max of 4 Processes at a time
-n 1 : Consume one input line at a time
-i : Use the replace string "{}"
-t : Print the command before executing it
perl -ne '
chomp(my ($url) = $_); # Remove trailing newline
my ($name) = $url =~ m|example.com/(.+)\?|; # Grab the filename
print "$url -O $name\n"; # Print all of the wget params
' url.txt | xargs -P 4 -n 1 -i -t wget "{}"
输出
wget http://example.com/file1.zip?arg=tereef&arg2=okook -O file1.zip
wget http://example.com/file2.zip?arg=tereef&arg2=okook -O file2.zip
--2016-07-21 22:24:44-- http://example.com/file2.zip?arg=tereef&arg2=okook%20-O%20file2.zip
--2016-07-21 22:24:44-- http://example.com/file1.zip?arg=tereef&arg2=okook%20-O%20file1.zip
Resolving example.com (example.com)... Resolving example.com (example.com)... 93.184.216.34, 2606:2800:220:1:248:1893:25c8:1946
93.184.216.34, Connecting to example.com (example.com)|93.184.216.34|:80... 2606:2800:220:1:248:1893:25c8:1946
Connecting to example.com (example.com)|93.184.216.34|:80... connected.
connected.
HTTP request sent, awaiting response... HTTP request sent, awaiting response... 404 Not Found
2016-07-21 22:24:44 ERROR 404: Not Found.
404 Not Found
2016-07-21 22:24:44 ERROR 404: Not Found.
您可以使用必须导出的 bash 函数才能在当前 shell
之外看到function mywget()
{
wget -O ${1%%\?*} "''"
}
export -f mywget
xargs -P 4 -n 1 -I {} bash -c "mywget '{}'" < url.txt
GNU Parallel 看起来像这样:
parallel -P 4 wget -O '{= s/\?.*//;s:.*/:: =}' {} <url.txt