爬取到的数据可以在终端上完整的打印出来,但是不能完全的写成文本
Data crawled can be printed completely on the terminal, but cannot be completely written in text
我从url上爬取了一些数据,能不能在终端完美显示。
下面是我的代码(从 url 抓取电影名称):
import requests
from lxml import etree
import json
url= 'https://movie.douban.com/j/search_subjects?type=movie&tag=%E7%83%AD%E9%97%A8&page_limit=50&page_start=0'
headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36',
'Referer':'https://www.baidu.com/link?url=WRKWuSkjVngibf0xz8JoiYzW3OIsrTP0-cj26aVvH8q7ZhP6qkOGY-Zwc8-HEGFw&wd=&eqid=9926e1c400630e7c000000035d90c4d7',
'Cookie': '_vwo_uuid_v2=D0DF3760AB53755DB79564FCF3EFA6601|fce72c403f87ac2ca7ae102837b10fec; __guid=223695111.4319207795202076000.1549549791840.3198; douban-fav-remind=1; viewed="27056409"; gr_user_id=8da42253-de1e-4f12-95ee-70a3cb0fda27; ll="118243"; bid=0SXwG9CmIhU; ap_v=0,6.0; _pk_ref.100001.4cf6=%5B%22%22%2C%22%22%2C1569768908%2C%22https%3A%2F%2Fwww.baidu.com%2Flink%3Furl%3DWRKWuSkjVngibf0xz8JoiYzW3OIsrTP0-cj26aVvH8q7ZhP6qkOGY-Zwc8-HEGFw%26wd%3D%26eqid%3D9926e1c400630e7c000000035d90c4d7%22%5D; _pk_ses.100001.4cf6=*; __utma=30149280.914565082.1523449990.1524056947.1569768908.3; __utmc=30149280; __utmz=30149280.1569768908.3.1.utmcsr=baidu|utmccn=(organic)|utmcmd=organic; __utma=223695111.2082725742.1523450059.1524056947.1569768908.3; __utmb=223695111.0.10.1569768908; __utmc=223695111; __utmz=223695111.1569768908.3.1.utmcsr=baidu|utmccn=(organic)|utmcmd=organic; UM_distinctid=16d7d84a6791f-00508116b3cea7-454c092b-100200-16d7d84a67a4c; CNZZDATA1272964020=1129763675-1569768675-https%253A%252F%252Fwww.baidu.com%252F%7C1569768675; __yadk_uid=RKbptYrIEyYDpE7KUTuHZW99Kmw5fxxz; __utmt_t1=1; _pk_id.100001.4cf6=bfdbd891f3d04fe0.1523450058.4.1569770601.1549549798.; __utmb=30149280.27.8.1569770601126; monitor_count=11; RT=s=1569770721069&r=https%3A%2F%2Fmovie.douban.com%2F'}
response=requests.get(url, headers=headers)
html_str=response.content.decode()
html_dict=json.loads(response.content.decode())
movie_dict=html_dict['subjects']
for i in movie_dict:
movie_name=i['title']
movie_rate=i['rate']
print (movie_name)
下面是输出(电影名称已抓取并完全显示在终端上):
银河补习班
保持沉默
我的天使
心理测量者SS2:第一卫士
心理测量者SS3
安娜
寄生虫
极限逃生
沉默的证人
仲夏夜惊魂
柳烈的音乐专辑
龙牌之谜
大侦探皮卡丘
友情以上
深夜食堂
送我上青云
蜘蛛侠:英雄远征
玩具总动员4
速度与激情:特别行动
流浪地球
疯狂的外星人
使徒行者2:谍影行动
烈火英雄
恶人传
无名之辈
飞驰人生
高草丛中
疾速备战
铤而走险
海王
阿丽塔:战斗天使
X战警:黑凤凰
蜘蛛侠:平行宇宙
我身体里的那个家伙
阿拉丁
巨鳄风暴
波西米亚狂想曲
白蛇:缘起
极限职业
亲密旅行
地久天长
复仇者联盟4:终局之战
鼠胆英雄
神奇动物:格林德沃之罪
骡子
小委托人
江南
爱宠大机密2
风中有朵雨做的云
无敌破坏王2:大闹互联网
但是当我尝试将数据写入文本时,文本中只有一项
下面是文中写入文件的代码:
with open('douban_movie.txt','w',encoding='utf-8') as file:
file.write(movie_name+'\n')
下面是代码的输出(文中只有一项):
无敌破坏王2:大闹互联网
我的问题:
为什么我爬取了所有数据并在终端上打印出来,但文本中只写了1项?
why is there only 1 item written in the text while I have crawled all the data and print them on the terminal?
因为你可能有这样的东西
movie_dict=html_dict['subjects']
for i in movie_dict:
movie_name=i['title']
movie_rate=i['rate']
print (movie_name)
with open('douban_movie.txt','w',encoding='utf-8') as file:
file.write(movie_name+'\n')
你最喜欢的时候应该有
with open('douban_movie.txt','w',encoding='utf-8') as file:
movie_dict=html_dict['subjects']
for i in movie_dict:
movie_name=i['title']
movie_rate=i['rate']
print (movie_name)
file.write(movie_name+'\n')
即在循环外打开文件写入一次,然后在循环内重复写入文件。
我从url上爬取了一些数据,能不能在终端完美显示。
下面是我的代码(从 url 抓取电影名称):
import requests
from lxml import etree
import json
url= 'https://movie.douban.com/j/search_subjects?type=movie&tag=%E7%83%AD%E9%97%A8&page_limit=50&page_start=0'
headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36',
'Referer':'https://www.baidu.com/link?url=WRKWuSkjVngibf0xz8JoiYzW3OIsrTP0-cj26aVvH8q7ZhP6qkOGY-Zwc8-HEGFw&wd=&eqid=9926e1c400630e7c000000035d90c4d7',
'Cookie': '_vwo_uuid_v2=D0DF3760AB53755DB79564FCF3EFA6601|fce72c403f87ac2ca7ae102837b10fec; __guid=223695111.4319207795202076000.1549549791840.3198; douban-fav-remind=1; viewed="27056409"; gr_user_id=8da42253-de1e-4f12-95ee-70a3cb0fda27; ll="118243"; bid=0SXwG9CmIhU; ap_v=0,6.0; _pk_ref.100001.4cf6=%5B%22%22%2C%22%22%2C1569768908%2C%22https%3A%2F%2Fwww.baidu.com%2Flink%3Furl%3DWRKWuSkjVngibf0xz8JoiYzW3OIsrTP0-cj26aVvH8q7ZhP6qkOGY-Zwc8-HEGFw%26wd%3D%26eqid%3D9926e1c400630e7c000000035d90c4d7%22%5D; _pk_ses.100001.4cf6=*; __utma=30149280.914565082.1523449990.1524056947.1569768908.3; __utmc=30149280; __utmz=30149280.1569768908.3.1.utmcsr=baidu|utmccn=(organic)|utmcmd=organic; __utma=223695111.2082725742.1523450059.1524056947.1569768908.3; __utmb=223695111.0.10.1569768908; __utmc=223695111; __utmz=223695111.1569768908.3.1.utmcsr=baidu|utmccn=(organic)|utmcmd=organic; UM_distinctid=16d7d84a6791f-00508116b3cea7-454c092b-100200-16d7d84a67a4c; CNZZDATA1272964020=1129763675-1569768675-https%253A%252F%252Fwww.baidu.com%252F%7C1569768675; __yadk_uid=RKbptYrIEyYDpE7KUTuHZW99Kmw5fxxz; __utmt_t1=1; _pk_id.100001.4cf6=bfdbd891f3d04fe0.1523450058.4.1569770601.1549549798.; __utmb=30149280.27.8.1569770601126; monitor_count=11; RT=s=1569770721069&r=https%3A%2F%2Fmovie.douban.com%2F'}
response=requests.get(url, headers=headers)
html_str=response.content.decode()
html_dict=json.loads(response.content.decode())
movie_dict=html_dict['subjects']
for i in movie_dict:
movie_name=i['title']
movie_rate=i['rate']
print (movie_name)
下面是输出(电影名称已抓取并完全显示在终端上):
银河补习班
保持沉默
我的天使
心理测量者SS2:第一卫士
心理测量者SS3
安娜
寄生虫
极限逃生
沉默的证人
仲夏夜惊魂
柳烈的音乐专辑
龙牌之谜
大侦探皮卡丘
友情以上
深夜食堂
送我上青云
蜘蛛侠:英雄远征
玩具总动员4
速度与激情:特别行动
流浪地球
疯狂的外星人
使徒行者2:谍影行动
烈火英雄
恶人传
无名之辈
飞驰人生
高草丛中
疾速备战
铤而走险
海王
阿丽塔:战斗天使
X战警:黑凤凰
蜘蛛侠:平行宇宙
我身体里的那个家伙
阿拉丁
巨鳄风暴
波西米亚狂想曲
白蛇:缘起
极限职业
亲密旅行
地久天长
复仇者联盟4:终局之战
鼠胆英雄
神奇动物:格林德沃之罪
骡子
小委托人
江南
爱宠大机密2
风中有朵雨做的云
无敌破坏王2:大闹互联网
但是当我尝试将数据写入文本时,文本中只有一项
下面是文中写入文件的代码:
with open('douban_movie.txt','w',encoding='utf-8') as file:
file.write(movie_name+'\n')
下面是代码的输出(文中只有一项):
无敌破坏王2:大闹互联网
我的问题:
为什么我爬取了所有数据并在终端上打印出来,但文本中只写了1项?
why is there only 1 item written in the text while I have crawled all the data and print them on the terminal?
因为你可能有这样的东西
movie_dict=html_dict['subjects']
for i in movie_dict:
movie_name=i['title']
movie_rate=i['rate']
print (movie_name)
with open('douban_movie.txt','w',encoding='utf-8') as file:
file.write(movie_name+'\n')
你最喜欢的时候应该有
with open('douban_movie.txt','w',encoding='utf-8') as file:
movie_dict=html_dict['subjects']
for i in movie_dict:
movie_name=i['title']
movie_rate=i['rate']
print (movie_name)
file.write(movie_name+'\n')
即在循环外打开文件写入一次,然后在循环内重复写入文件。