pycharm 控制台 unicode 到可读字符串
pycharm console unicode to readable string
一起学习python
问题是当我尝试获取西里尔字符时,我在 pycharm 控制台中获取了 unicode。
import requests
from bs4 import BeautifulSoup
import operator
import codecs
def start(url):
word_list = []
source_code = requests.get(url).text
soup = BeautifulSoup(source_code)
for post_text in soup.findAll('a', {'class': 'b-tasks__item__title js-set-visited'}):
content = post_text.string
words = content.lower().split()
for each_word in words:
word_list.append(each_word)
clean_up_list(word_list)
def clean_up_list(word_list):
clean_word_list = []
for word in word_list:
symbols = "!@#$%^&*()_+{}|:<>?,./;'[]\=-\""
for i in range(0, len(symbols)):
word = word.replace(symbols[i], "")
if len(word) > 0:
clean_word_list.append(word)
create_dictionary(clean_word_list)
def create_dictionary(clean_word_list):
word_count = {}
for word in clean_word_list:
if word in word_count:
word_count[word] += 1
else:
word_count[word] = 1
for key, value in sorted(word_count.items(), key=operator.itemgetter(1)):
print(key, value)
当我将 print(key, value) 更改为 print(key.decode('utf8'), value) 我得到 "UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)"
开始('https://youdo.com/tasks-all-opened-all-moscow-1')
互联网上有一些关于更改某些文件编码的建议 - 不太明白。我不能在控制台中阅读它吗?
OSX
UPD
key.encode("utf-8")
UTF-8 有时很痛苦。我创建了一个包含一行拉丁字符的文件和另一个包含俄语字符的文件。以下代码:
# encoding: utf-8
with open("testing.txt", "r", encoding='utf-8') as f:
line = f.read()
print(line)
输出 PyCharm
注意两个 encoding
条目
由于您是从网页获取数据,因此还必须确保使用正确的编码。以下代码
# encoding: utf-8
r = requests.get('http://www.pravda.ru/')
r.encoding = 'utf-8'
print(r.text)
在PyCharm中的输出为
请注意,您必须专门设置编码以匹配页面之一。
问题是当我尝试获取西里尔字符时,我在 pycharm 控制台中获取了 unicode。
import requests
from bs4 import BeautifulSoup
import operator
import codecs
def start(url):
word_list = []
source_code = requests.get(url).text
soup = BeautifulSoup(source_code)
for post_text in soup.findAll('a', {'class': 'b-tasks__item__title js-set-visited'}):
content = post_text.string
words = content.lower().split()
for each_word in words:
word_list.append(each_word)
clean_up_list(word_list)
def clean_up_list(word_list):
clean_word_list = []
for word in word_list:
symbols = "!@#$%^&*()_+{}|:<>?,./;'[]\=-\""
for i in range(0, len(symbols)):
word = word.replace(symbols[i], "")
if len(word) > 0:
clean_word_list.append(word)
create_dictionary(clean_word_list)
def create_dictionary(clean_word_list):
word_count = {}
for word in clean_word_list:
if word in word_count:
word_count[word] += 1
else:
word_count[word] = 1
for key, value in sorted(word_count.items(), key=operator.itemgetter(1)):
print(key, value)
当我将 print(key, value) 更改为 print(key.decode('utf8'), value) 我得到 "UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)"
开始('https://youdo.com/tasks-all-opened-all-moscow-1') 互联网上有一些关于更改某些文件编码的建议 - 不太明白。我不能在控制台中阅读它吗? OSX
UPD
key.encode("utf-8")
UTF-8 有时很痛苦。我创建了一个包含一行拉丁字符的文件和另一个包含俄语字符的文件。以下代码:
# encoding: utf-8
with open("testing.txt", "r", encoding='utf-8') as f:
line = f.read()
print(line)
输出 PyCharm
注意两个 encoding
条目
由于您是从网页获取数据,因此还必须确保使用正确的编码。以下代码
# encoding: utf-8
r = requests.get('http://www.pravda.ru/')
r.encoding = 'utf-8'
print(r.text)
在PyCharm中的输出为
请注意,您必须专门设置编码以匹配页面之一。