有没有办法可以从 link 下载文件的特定部分？

Question

我有一个大约 8GB 的文件，我正在尝试下载它，它位于：www.cs.jhu.edu/~anni/ALNC/030314corpus。splittoklc.tgz

但是，服务器每隔几秒就会关闭我的连接，让我只能以我的连接速度下载 50-90MB 的文件。我也交换了 ip 地址，但得到了相同的行为。其他人也会这样吗？

这是我从 wget

得到的输出

我想知道我是否可以像 wget 前几次那样自动重置我的连接？现在它只是在一段时间后冻结。

或者，有没有一种方法可以使用 wget 或 python 的 requests 包或其他语言来收集文件的不同部分？

更新：

我在我的 phone 上试过了，它似乎工作起来很慢。知道为什么会发生这种情况以及如何解决吗？

更新：

phone 连接最终也会重置，由于文件太大，我无法接近完成。

Answer 1

预赛

要使其中任何一项正常工作，服务器需要支持范围请求，它将以 206 Partial Content 响应。从您的终端输出来看，有问题的服务器似乎有支持。

您的问题

However, the server closes my connection every few seconds allowing me to only download 50-90MB of the file at my connection speeds. I've swapped ip addresses too, but get the same behavior. Does this also happen for everyone else?

不，下载对我来说没有大问题。我测试了

curl www.cs.jhu.edu/~anni/ALNC/030314corpus.splittoklc.tgz > /dev/null

I'm wondering if I can reset my connection like wget did automatically the first few times?

wget 似乎已经自动重试下载。从您包含的终端输出来看，似乎 wget 最终会 "get there"。您可以使用 wget --continue [URL].

让 wget 继续下载未完成的下载

Alternatively, is there a way I can collect different parts of the file with wget or with python's requests package or some other language?

从wget 1.16开始，您可以使用wget --start-pos 500 [URL]从指定位置开始下载。

您也可以使用 curl -r 500-1000 [URL] 下载给定范围内的字节。

对于 Python 的 requests 模块，根据 this SO answer:

import requests

headers = {"Range": "bytes=0-100"}
r = requests.get("https://example.com/link", headers=headers)

有没有办法可以从 link 下载文件的特定部分？

Is there a way I can download specific parts of a file from a link?

network-programming

wget

python-requests

预赛

您的问题

更多信息的关键字