TypeError: must be str, not bytes in IBM Watson

TypeError: must be str, not bytes in IBM Watson

我刚刚完成 CodeAcademyIBM Watson 课程,他们在 python 2 中编程,当我在 python 3 中将文件带过来时,我一直收到此错误。文件脚本和所有凭据在 CodeAcademy 中运行良好。这是因为我在 Python 3 工作,还是因为代码中的问题。

    Traceback (most recent call last):
  File "c:\Users\Guppy\Programs\PythonCode\Celebrity Match\CelebrityMatch.py", line 58, in <module>
    user_result = analyze(user_handle)
  File "c:\Users\Guppy\Programs\PythonCode\Celebrity Match\CelebrityMatch.py", line 22, in analyze
    text += status.text.encode('utf-8')
TypeError: must be str, not bytes 

有谁知道怎么回事,代码如下:

import sys
import operator
import requests
import json
import twitter
from watson_developer_cloud import PersonalityInsightsV2 as PersonalityInsights

def analyze(handle):
    twitter_consumer_key = '<key>'
    twitter_consumer_secret = '<secret>'
    twitter_access_token = '<token>'
    twitter_access_secret = '<secret>'

    twitter_api = twitter.Api(consumer_key=twitter_consumer_key, consumer_secret=twitter_consumer_secret, access_token_key=twitter_access_token, access_token_secret=twitter_access_secret)

    statuses = twitter_api.GetUserTimeline(screen_name = handle, count = 200, include_rts = False)

    text = ""

    for status in statuses:
        if (status.lang =='en'): #English tweets
            text += status.text.encode('utf-8')

    #The IBM Bluemix credentials for Personality Insights!
    pi_username = '<username>'
    pi_password = '<password>'

    personality_insights = PersonalityInsights(username=pi_username, password=pi_password)
    pi_result = personality_insights.profile(text)
    return pi_result

def flatten(orig):
    data = {}
    for c in orig['tree']['children']:
        if 'children' in c:
            for c2 in c['children']:
                if 'children' in c2:
                    for c3 in c2['children']:
                        if 'children' in c3:
                            for c4 in c3['children']:
                                if (c4['category'] == 'personality'):
                                    data[c4['id']] = c4['percentage']
                                    if 'children' not in c3:
                                        if (c3['category'] == 'personality'):
                                                data[c3['id']] = c3['percentage']
    return data

def compare(dict1, dict2):
    compared_data = {}
    for keys in dict1:
        if dict1[keys] != dict2[keys]:
                compared_data[keys]=abs(dict1[keys] - dict2[keys])
    return compared_data

user_handle = "@itsguppythegod"
celebrity_handle = "@giselleee_____"

user_result = analyze(user_handle)
celebrity_result = analyze(celebrity_handle)

user = flatten(user_result)
celebrity = flatten(celebrity_result)

compared_results = compare(user, celebrity)

sorted_result = sorted(compared_results.items(), key=operator.itemgetter(1))

for keys, value in sorted_result[:5]:
    print(keys, end = " ")
    print(user[keys], end = " ")
    print ('->', end - " ")
    print (celebrity[keys], end = " ")
    print ('->', end = " ")
    print (compared_results[keys])

您在此处创建了一个 str(unicode 文本)对象:

text = ""

然后继续附加 UTF-8 编码字节:

text += status.text.encode('utf-8')

在 Python 2 中,"" 创建了一个字节串,这一切都很好(尽管您随后将 UTF-8 字节发布到一个服务,该服务会将其全部解释为 Latin-1,见 API documentation.

要解决此问题,不要对状态文本进行编码,直到您收集完所有推文。此外,告诉 Watson 期待 UTF-8 数据。最后但并非最不重要的一点是,您真的应该首先构建一个推特文本列表,然后在一步中使用 str.join() 将它们连接起来,因为在循环中连接字符串需要二次方时间:

text = []

for status in statuses:
    if (status.lang =='en'): #English tweets
        text.append(status.text)

# ...

personality_insights = PersonalityInsights(username=pi_username, password=pi_password)
pi_result = personality_insights.profile(
    ' '.join(text).encode('utf8'),
    content_type='text/plain; charset=utf-8'
)