从位于远程的超链接读取信息(知道通过超链接它将是 csv 文件,但找不到通用方法)

Read info from hyperlink located remotely (know that by hyperlink it will be csv file, but can't find general approach)

我正在使用 Python 3.4。我已经尝试在网上找到解决方案,但仍然没有。

我有一个 link 到 csv 文件(数据集)。

有没有办法从这个 link 获取信息而不在本地目录中复制它?(例如,我的磁盘上没有足够的 space)

我想继续处理将在 RAM 中的数据。(例如,我计划找出有多少数据行并且必须进行一些数据挖掘和过滤,目前并不重要)

尝试以下操作:

import requests
r = requests.get('http://127.0.0.1/some_path/small.csv')
print len(r.content.split('\n')) -1

结果: 10

对于small.csv文件如下:

1lpcfgokakmgnkcojhhkbfbldkacnbeo,6B5108
pjkljhe2ncpnkpknbcohdijeoejaedia,678425
apdfllc5aahabafndbhieahigkjlhalf,651374
aohghmighlieiainnegkcijnfilokake,591116
coobgpohoikkiipiblmjeljniedjpjpf,587200
dmgjnkhnkblpmfjpdakehnaikgdjllic,540979
felcaaldnbdncclmgdcncolpebgiejap,480535
aapocclcgogkmnckokdopfmhonfmgoek,480441
pdehmppfilefbolgganhfihpbmjlgebh,273609
nafaimnnclfjfedmmabolbppcngeolgf,105979

编辑:(根据 MHawke 的建议)

import requests
line_cnt=0
r = requests.get('http://127.0.0.1/some_path/small.csv',stream=True)
for i in r.iter_lines():
    if i.strip():
        line_cnt +=1
print (line_cnt)

此版本不计算空行,对于大文件应该更有效,因为它使用 iter_lines

iter_lines(chunk_size=512, decode_unicode=None, delimiter=None)

Iterates over the response data, one line at a time. When stream=True is set on the request, this avoids reading the content at once into memory for large responses.

(注意:不可重入安全)