如何在 Python 中使用 json.loads 获取文本

Question

我写了一个代码，从 Instagram 获取用户名。有时我的算法不起作用，我得到的名字是 'p'。我正在尝试为此异常编写代码（在 if head == 'p': part of code 中）。首先，我使用 soup.select 来获取此信息块：

Blockquote
{"@context":"http:\/\/schema.org","@type":"ImageObject","caption":"I think I\u2019m getting better at editing these.... and by that I mean that there getting more and more muddled to the point hat I don\u2019t think people will be able to tell what they are soon.... not really what I\u2019m going for but oh well.\n-\n\u2022September 3 2018\u2022\n-\n-\nThis is the update I mentioned on my cuts on my leg. I finally cleaned them after two days. I normally don\u2019t wait that long but didn\u2019t really have the right circumstances to actually get to clean them the night of the relapse. *shrug*\n-\n-\n-\n#selfharm #selfharmo","representativeOfPage":"http:\/\/schema.org\/True","uploadDate":"2018-09-04T06:27:24","author":{"@type":"Person",**"alternateName":"@alittlereddrop"**,"mainEntityofPage":{"@type":"ProfilePage","@id":"https:\/\/www.instagram.com\/alittlereddrop\/"}},"commentCount":"0","interactionStatistic":{"@type":"InteractionCounter","interactionType":{"@type":"LikeAction"},"userInteractionCount":"2"},"mainEntityofPage":{"@type":"ItemPage","@id":"https:\/\/www.instagram.com\/p\/BnS0sdDlsmP\/?tagged=selfharmo"},"description":"2 Likes, 0 Comments - No One Cares (@alittlereddrop) on Instagram: \u201cI think I\u2019m getting better at editing these.... and by that I mean that there getting more and more\u2026\u201d","name":"No One Cares on Instagram: \u201cI think I\u2019m getting better at editing these.... and by that I mean that there getting more and more muddled to the point hat I don\u2019t think\u2026\u201d"}
Blockquote

有一个部分 "alternateName": 其中包含一个名称。但是即使 json.loads 我也无法得到它。你有什么想法吗？

file = open('users.txt', 'r', encoding="ISO-8859-1")
urls = file.readlines()
for url in urls:
url = url.strip ('\n')
try:
    req = requests.get(url)
    req.raise_for_status()
except HTTPError as http_err:
    output = open('output2.txt', 'a')
    output.write(f'К сожалению страница недоступна.\n')  
except Exception as err:
    output = open('output2.txt', 'a')
    output.write(f'К сожалению страница недоступна2\n')  
else:
    output = open('output2.txt', 'a')
    soup = BeautifulSoup(req.text, "lxml")
    the_url = soup.select("[rel='canonical']")[0]['href']
    the_url2=the_url.replace('https://www.instagram.com/','')
    head, sep, tail = the_url2.partition('/')
    if head == 'p':
        data = soup.select("[type='application/ld+json']")[0]
        oJson2 = json.loads(data.text)["alternateName"]
        str (oJson2)
        output.write (oJson2+'\n')
    else: 
        output.write (head+'\n')

Answer 1

您的 json 文件中的语法有问题。双星有两处错位：

**"alternateName":"@alittlereddrop"**,.

如果您从文件中打开 json，请执行以下操作：

import json

with open('yourfilename.json') as fo:
    jsn = json.loads(fo.read().replace('**', ''))
print(jsn['author']['alternateName'])
# '@alittlereddrop'

在你的情况下，尝试代替这一行：

oJson2 = json.loads(data.text)["alternateName"]

这个

oJson2 = json.loads(data.text.replace('**', ''))['author']["alternateName"]

如何在 Python 中使用 json.loads 获取文本

How to get text with json.loads in Python

python

json

beautifulsoup

python-3.x

instagram