使用 Python 分析 YouTube 评论 -- 参数已禁用评论

Question

我正在尝试使用 YouTube 评论进行文本分析。我一直在使用来自以下网站的代码来抓取 YouTube：

https://www.pingshiuanchua.com/blog/post/using-youtube-api-to-analyse-youtube-comments-on-python

脚本开始运行，但有一段代码会在评论被禁用时生成错误，而且我无法找到一种方法来检查评论是否被禁用或评论是否存在，并且如果没有评论可抓取，则跳过该视频，并继续观看下一个视频。

产生错误的相关代码块是：

# =============================================================================
# Get Comments of Top Videos
# =============================================================================

video_id_pop = []
channel_pop = []
video_title_pop = []
video_desc_pop = []
comments_pop = []
comment_id_pop = []
reply_count_pop = []
like_count_pop = []

from tqdm import tqdm

for i, video in enumerate(tqdm(video_id, ncols = 100)):
    response = service.commentThreads().list(
                    part = 'snippet',
                    videoId = video,
                    maxResults = 100, # Only take top 100 comments...
                    order = 'relevance', #... ranked on relevance
                    textFormat = 'plainText',
                    ).execute()
    
    comments_temp = []
    comment_id_temp = []
    reply_count_temp = []
    like_count_temp = []
    for item in response['items']:
        comments_temp.append(item['snippet']['topLevelComment']['snippet']['textDisplay'])
        comment_id_temp.append(item['snippet']['topLevelComment']['id'])
        reply_count_temp.append(item['snippet']['totalReplyCount'])
        like_count_temp.append(item['snippet']['topLevelComment']['snippet']['likeCount'])
    comments_pop.extend(comments_temp)
    comment_id_pop.extend(comment_id_temp)
    reply_count_pop.extend(reply_count_temp)
    like_count_pop.extend(like_count_temp)
    
    video_id_pop.extend([video_id[i]]*len(comments_temp))
    channel_pop.extend([channel[i]]*len(comments_temp))
    video_title_pop.extend([video_title[i]]*len(comments_temp))
    video_desc_pop.extend([video_desc[i]]*len(comments_temp))
    
query_pop = [query] * len(video_id_pop)

编辑添加：

创建代码的人留言修复错误说：

"您可以将代码的查询部分包装在 try...except 语句中，如果 try 语句（查询部分）失败，您可以将空白响应或“错误”字符串除外列表。

如果对其他人有意义，我有 NFI 如何执行此操作...

Answer 1

注意：这不一定是“好的”编码风格，但如果我在为自己 short-term 编写脚本时运行遇到这个问题，我就会这样做, 个人使用。

Python（以及许多其他语言）有一种方法可以捕获异常并在不崩溃的情况下处理它们。如果使用得当，这可能是处理不良数据的一种非常好的方法。

https://docs.python.org/3.8/tutorial/errors.html 是对异常的很好的概述。一般来说，他们采用的格式类似于

try:
    code_that_can_error()
except ExceptionThatWIllBeThrown as ex:
    handle_exception()
    print(ex) # ex is an object that has information about what went wrong
finally:
    clean_up()

(如果你有一些东西需要关闭，比如一个文件，finally 特别有用。如果抛出异常，你可能不会关闭它，但是 finally 是 gua运行teed 得到调用，即使抛出异常。）

对于您的情况，我们只需要忽略错误并继续播放下一个视频。

for i, video in enumerate(tqdm(video_id, ncols = 100)):
    try:
        response = service.commentThreads().list(
                        part = 'snippet',
                        videoId = video,
                        maxResults = 100, # Only take top 100 comments...
                        order = 'relevance', #... ranked on relevance
                        textFormat = 'plainText',
                        ).execute()
    
        comments_temp = []
        [...]
        video_desc_pop.extend([video_desc[i]]*len(comments_temp))
    except:
        # Something threw an error. Skip that video and move on
        print(f"{video} has comments disabled, or something else went wrong")

query_pop = [query] * len(video_id_pop)

使用 Python 分析 YouTube 评论 -- 参数已禁用评论

Analysing YouTube comments using Python -- parameter has disabled comments

python

youtube

nlp