pycharm 控制台 unicode 到可读字符串

Question

一起学习python

问题是当我尝试获取西里尔字符时，我在 pycharm 控制台中获取了 unicode。

import requests
from bs4 import BeautifulSoup
import operator
import codecs

def start(url):
    word_list = []
    source_code = requests.get(url).text
    soup = BeautifulSoup(source_code)

    for post_text in soup.findAll('a', {'class': 'b-tasks__item__title js-set-visited'}):
        content = post_text.string

        words = content.lower().split()
        for each_word in words:
            word_list.append(each_word)
    clean_up_list(word_list)



def clean_up_list(word_list):
    clean_word_list = []
    for word in word_list:
        symbols = "!@#$%^&*()_+{}|:<>?,./;'[]\=-\""
        for i in range(0, len(symbols)):
            word = word.replace(symbols[i], "")
        if len(word) > 0:
            clean_word_list.append(word)
    create_dictionary(clean_word_list)



def create_dictionary(clean_word_list):
    word_count = {}
for word in clean_word_list:
    if word in word_count:
        word_count[word] += 1
    else:
        word_count[word] = 1

for key, value in sorted(word_count.items(), key=operator.itemgetter(1)):
    print(key, value)

当我将 print(key, value) 更改为 print(key.decode('utf8'), value) 我得到 "UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)"

开始('https://youdo.com/tasks-all-opened-all-moscow-1') 互联网上有一些关于更改某些文件编码的建议 - 不太明白。我不能在控制台中阅读它吗？ OSX

UPD key.encode("utf-8")

Answer 1

UTF-8 有时很痛苦。我创建了一个包含一行拉丁字符的文件和另一个包含俄语字符的文件。以下代码：

# encoding: utf-8

with open("testing.txt", "r", encoding='utf-8') as f:
    line = f.read()
    print(line)

输出 PyCharm

注意两个 encoding 条目

由于您是从网页获取数据，因此还必须确保使用正确的编码。以下代码

# encoding: utf-8
r = requests.get('http://www.pravda.ru/')
r.encoding = 'utf-8'
print(r.text)

在PyCharm中的输出为

请注意，您必须专门设置编码以匹配页面之一。

pycharm 控制台 unicode 到可读字符串

pycharm console unicode to readable string

python

pycharm