从 JSON 获取关键字列表
Getting list of keywords from JSON
我遇到了一个问题,我不明白为什么会这样打印出来。
下面是我的代码,由于我是编程新手,所以格式不好请见谅,这是打开一个有一堆关键字的文本文件
import urllib2
import json
f1 = open('CatList.text')
lines = f1.readlines()
for line in lines:
url ='https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle='+line+'&cmlimit=100'
print(url)
json_obj = urllib2.urlopen(url)
data = json.load(json_obj)
#to write the result
f2 = open('SubList.text', 'w')
f2.write(url)
for item in data['query']:
for i in data['query']['categorymembers']:
f2.write((i['title']).encode('utf8')+"\n")
我收到错误:
Traceback (most recent call last):
File "Test2.py", line 16, in <module>
json_obj = urllib2.urlopen(url)
File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 402, in open
req = meth(req)
File "/usr/lib/python2.7/urllib2.py", line 1113, in do_request_
raise URLError('no host given')
urllib2.URLError: <urlopen error no host given>
我不确定这个错误是什么意思,但我试过这个来打印 url。
import urllib2
import json
f1 = open('CatList.text')
f2 = open('SubList.text', 'w')
lines = f1.readlines()
for line in lines:
url ='https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle='+line+'&cmlimit=100'
print(url)
f2.write(url+'\n')
我得到的结果很奇怪(以下是部分结果):
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Branches of geography
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geography by place
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geography awards and competitions
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geography conferences
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geography education
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Environmental studies
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Exploration
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geocodes
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geographers
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geographical zones
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geopolitical corridors
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:History of geography
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Land systems
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Landscape
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geography-related lists
&cmlimit=100
请注意 URL 分为两部分
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geography-related lists
&cmlimit=100
而不是
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geography-related lists&cmlimit=100
我的第一个问题是如何解决这个问题?
其次,这是给我的错误吗?
我的CatList.text如下:
Category:Branches of geography
Category:Geography by place
Category:Geography awards and competitions
Category:Geography conferences
Category:Geography education
Category:Environmental studies
Category:Exploration
Category:Geocodes
Category:Geographers
Category:Geographical zones
Category:Geopolitical corridors
Category:History of geography
Category:Land systems
Category:Landscape
Category:Geography-related lists
Category:Lists of countries by geography
Category:Navigation
Category:Geography organizations
Category:Places
Category:Geographical regions
Category:Surveying
Category:Geographical technology
Category:Geography terminology
Category:Works about geography
Category:Geographic images
Category:Geography stubs
抱歉这么久 post。非常感谢你的帮助。谢谢。
朋友,换行一般用'\n'。同样的道理,在一个文件中,每一行之间都有隐藏的 '\n' 字符。
所以在 lines = f1.readlines() 它在所有行的末尾包含'\n'。就是这个问题。
为避免这种情况,您应该阅读 f1.read.splitlines() 。
更新以下行
url ='https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle='+line+'&cmlimit=100'
到
url ='https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle='+line.strip()+'&cmlimit=100'
您的 line
包含换行符 (\n
),将使用 .strip()
删除字符串两端的空格。
我遇到了一个问题,我不明白为什么会这样打印出来。
下面是我的代码,由于我是编程新手,所以格式不好请见谅,这是打开一个有一堆关键字的文本文件
import urllib2
import json
f1 = open('CatList.text')
lines = f1.readlines()
for line in lines:
url ='https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle='+line+'&cmlimit=100'
print(url)
json_obj = urllib2.urlopen(url)
data = json.load(json_obj)
#to write the result
f2 = open('SubList.text', 'w')
f2.write(url)
for item in data['query']:
for i in data['query']['categorymembers']:
f2.write((i['title']).encode('utf8')+"\n")
我收到错误:
Traceback (most recent call last):
File "Test2.py", line 16, in <module>
json_obj = urllib2.urlopen(url)
File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 402, in open
req = meth(req)
File "/usr/lib/python2.7/urllib2.py", line 1113, in do_request_
raise URLError('no host given')
urllib2.URLError: <urlopen error no host given>
我不确定这个错误是什么意思,但我试过这个来打印 url。
import urllib2
import json
f1 = open('CatList.text')
f2 = open('SubList.text', 'w')
lines = f1.readlines()
for line in lines:
url ='https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle='+line+'&cmlimit=100'
print(url)
f2.write(url+'\n')
我得到的结果很奇怪(以下是部分结果):
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Branches of geography
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geography by place
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geography awards and competitions
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geography conferences
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geography education
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Environmental studies
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Exploration
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geocodes
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geographers
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geographical zones
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geopolitical corridors
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:History of geography
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Land systems
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Landscape
&cmlimit=100
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geography-related lists
&cmlimit=100
请注意 URL 分为两部分
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geography-related lists
&cmlimit=100
而不是
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Geography-related lists&cmlimit=100
我的第一个问题是如何解决这个问题?
其次,这是给我的错误吗?
我的CatList.text如下:
Category:Branches of geography
Category:Geography by place
Category:Geography awards and competitions
Category:Geography conferences
Category:Geography education
Category:Environmental studies
Category:Exploration
Category:Geocodes
Category:Geographers
Category:Geographical zones
Category:Geopolitical corridors
Category:History of geography
Category:Land systems
Category:Landscape
Category:Geography-related lists
Category:Lists of countries by geography
Category:Navigation
Category:Geography organizations
Category:Places
Category:Geographical regions
Category:Surveying
Category:Geographical technology
Category:Geography terminology
Category:Works about geography
Category:Geographic images
Category:Geography stubs
抱歉这么久 post。非常感谢你的帮助。谢谢。
朋友,换行一般用'\n'。同样的道理,在一个文件中,每一行之间都有隐藏的 '\n' 字符。
所以在 lines = f1.readlines() 它在所有行的末尾包含'\n'。就是这个问题。
为避免这种情况,您应该阅读 f1.read.splitlines() 。
更新以下行
url ='https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle='+line+'&cmlimit=100'
到
url ='https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle='+line.strip()+'&cmlimit=100'
您的 line
包含换行符 (\n
),将使用 .strip()
删除字符串两端的空格。