从位于远程的超链接读取信息（知道通过超链接它将是 csv 文件，但找不到通用方法）

Question

我正在使用 Python 3.4。我已经尝试在网上找到解决方案，但仍然没有。

我有一个 link 到 csv 文件（数据集）。

有没有办法从这个 link 获取信息而不在本地目录中复制它？（例如，我的磁盘上没有足够的 space）

我想继续处理将在 RAM 中的数据。（例如，我计划找出有多少数据行并且必须进行一些数据挖掘和过滤，目前并不重要)

Answer 1

尝试以下操作：

import requests
r = requests.get('http://127.0.0.1/some_path/small.csv')
print len(r.content.split('\n')) -1

结果： 10

对于small.csv文件如下：

1lpcfgokakmgnkcojhhkbfbldkacnbeo,6B5108
pjkljhe2ncpnkpknbcohdijeoejaedia,678425
apdfllc5aahabafndbhieahigkjlhalf,651374
aohghmighlieiainnegkcijnfilokake,591116
coobgpohoikkiipiblmjeljniedjpjpf,587200
dmgjnkhnkblpmfjpdakehnaikgdjllic,540979
felcaaldnbdncclmgdcncolpebgiejap,480535
aapocclcgogkmnckokdopfmhonfmgoek,480441
pdehmppfilefbolgganhfihpbmjlgebh,273609
nafaimnnclfjfedmmabolbppcngeolgf,105979

编辑：（根据 MHawke 的建议）

import requests
line_cnt=0
r = requests.get('http://127.0.0.1/some_path/small.csv',stream=True)
for i in r.iter_lines():
    if i.strip():
        line_cnt +=1
print (line_cnt)

此版本不计算空行，对于大文件应该更有效，因为它使用 iter_lines

iter_lines(chunk_size=512, decode_unicode=None, delimiter=None)

Iterates over the response data, one line at a time. When stream=True is set on the request, this avoids reading the content at once into memory for large responses.

（注意：不可重入安全）

从位于远程的超链接读取信息（知道通过超链接它将是 csv 文件，但找不到通用方法）

Read info from hyperlink located remotely (know that by hyperlink it will be csv file, but can't find general approach)

csv

hyperlink

fetch

python-3.x