从 Json 数据制作直方图
Making a histogram from Json data
我有 JSON 格式的数据,看起来像这样
{
"ts": 1393631983,
"visitor_uuid": "ade7e1f63bc83c66",
"visitor_source": "external",
"visitor_device": "browser",
"visitor_useragent": "Opera/9.80 (Windows NT 6.1) Presto/2.12.388 Version/12.16",
"visitor_ip": "b5af0ba608ab307c",
"visitor_country": "BR",
"visitor_referrer": "53c643c16e8253e7",
"env_type": "reader",
"env_doc_id": "140222143932-91796b01f94327ee809bd759fd0f6c76",
"event_type": "pagereadtime",
"event_readtime": 1010,
"subject_type": "doc",
"subject_doc_id": "140222143932-91796b01f94327ee809bd759fd0f6c76",
"subject_page": 3
} {
"ts": 1393631983,
"visitor_uuid": "232eeca785873d35",
"visitor_source": "internal",
"visitor_device": "browser",
"visitor_useragent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.117 Safari/537.36",
"visitor_ip": "fcf9c67037f993f0",
"visitor_country": "MX",
"visitor_referrer": "63765fcd2ff864fd",
"env_type": "stream",
"env_ranking": 10,
"env_build": "1.7.118-b946",
"env_name": "explore",
"env_component": "editors_picks",
"event_type": "impression",
"subject_type": "doc",
"subject_doc_id": "100713205147-2ee05a98f1794324952eea5ca678c026",
"subject_page": 1
}
我的任务要求我找到 subject_doc_id 与用户的输入相匹配,然后显示一个直方图,显示查看该文档的国家/地区。
我已经能够使用我的代码阅读数据,并且我也熟悉如何绘制直方图,但我需要有关如何计算国家/地区并将其显示在直方图中的帮助。
例如上面的数据“visitor_country”:“MX”和“visitor_country”:“BR”存在所以我想要每个国家的数量。
关于如何实现这一点有什么想法吗?
我不得不稍微修改您的文件内容以使其有效 JSON,然后将其另存为 'jsonExample.json' 在我的工作目录中。
修改后,json数据变成了这样的形式:
{
"visitor1": {[your data]}
"visotor2": {[your data]}
}
然后使用 json 库 (https://docs.python.org/3/library/json.html),您可以只列出每个访问者的国家/地区并计算每个访问者出现的次数:
import json
with open("jsonExample.json", 'r') as file:
contents = file.read()
visitors = json.loads(contents)
countryList = []
for v in visitors.keys():
if visitors[v]['subject_doc_id'] == "desired_subject_doc_id":
countryList.append(visitors[v]['visitor_country'])
for country in set(countryList):
print(f"Country {country} appears {countryList.count(country)} times")
if visitors[v]['subject_doc_id']
语句检查subject_doc_id
是否匹配指定值,只需将RHS替换为所需的id即可。
您的 json 文件不正确 json 文件。
您需要在文件开头添加“[”,在文件末尾添加“]”,并用逗号分隔每个“{}”部分。
这是一个例子:
Data.json
[
{
"ts": 1393631983,
"visitor_uuid": "ade7e1f63bc83c66",
"visitor_source": "external",
"visitor_device": "browser",
"visitor_useragent": "Opera/9.80 (Windows NT 6.1) Presto/2.12.388 Version/12.16",
"visitor_ip": "b5af0ba608ab307c",
"visitor_country": "BR",
"visitor_referrer": "53c643c16e8253e7",
"env_type": "reader",
"env_doc_id": "140222143932-91796b01f94327ee809bd759fd0f6c76",
"event_type": "pagereadtime",
"event_readtime": 1010,
"subject_type": "doc",
"subject_doc_id": "140222143932-91796b01f94327ee809bd759fd0f6c76",
"subject_page": 3
}, {
"ts": 1393631983,
"visitor_uuid": "232eeca785873d35",
"visitor_source": "internal",
"visitor_device": "browser",
"visitor_useragent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.117 Safari/537.36",
"visitor_ip": "fcf9c67037f993f0",
"visitor_country": "MX",
"visitor_referrer": "63765fcd2ff864fd",
"env_type": "stream",
"env_ranking": 10,
"env_build": "1.7.118-b946",
"env_name": "explore",
"env_component": "editors_picks",
"event_type": "impression",
"subject_type": "doc",
"subject_doc_id": "100713205147-2ee05a98f1794324952eea5ca678c026",
"subject_page": 1
}, {
"ts": 1393631983,
"visitor_uuid": "232eeca785873d35",
"visitor_source": "internal",
"visitor_device": "browser",
"visitor_useragent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.117 Safari/537.36",
"visitor_ip": "fcf9c67037f993f0",
"visitor_country": "PL",
"visitor_referrer": "63765fcd2ff864fd",
"env_type": "stream",
"env_ranking": 10,
"env_build": "1.7.118-b946",
"env_name": "explore",
"env_component": "editors_picks",
"event_type": "impression",
"subject_type": "doc",
"subject_doc_id": "100713205147-2ee05a98f1794324952eea5ca678c026",
"subject_page": 1
}
, {
"ts": 1393631983,
"visitor_uuid": "232eeca785873d35",
"visitor_source": "internal",
"visitor_device": "browser",
"visitor_useragent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.117 Safari/537.36",
"visitor_ip": "fcf9c67037f993f0",
"visitor_country": "PL",
"visitor_referrer": "63765fcd2ff864fd",
"env_type": "stream",
"env_ranking": 10,
"env_build": "1.7.118-b946",
"env_name": "explore",
"env_component": "editors_picks",
"event_type": "impression",
"subject_type": "doc",
"subject_doc_id": "100713205147-2ee05a98f1794324952eea5ca678c026",
"subject_page": 1
}
]
之后对于 data.json 文件中的每个元素,我正在检查它是否与我们的输入 subject_doc_id 匹配。如果我们找到匹配项,我会将其附加到匹配项列表中,这样我们就可以为直方图收集数据。之后,我想根据独特国家的数量获得一些箱子,所以我正在创建一个独特的国家列表,然后我正在检查它的长度。
import matplotlib.pyplot as plt
import json
with open("data.json") as json_file:
data = json.load(json_file)
#Here is the subject id i'm using for the data presentation
#100713205147-2ee05a98f1794324952eea5ca678c026
subject_id = input("subject_doc_id: ")
visitors = []
for i in range(len(data)):
if subject_id == data[i]["subject_doc_id"]:
print("got a match from {}".format(data[i]["visitor_country"]))
visitors.append(data[i]["visitor_country"])
countries = []
for i in visitors:
if i not in countries:
countries.append(i)
try:
plt.hist(visitors, bins = len(countries))
plt.show()
except ValueError:
print("No matches for given subject_doc_id")
如果要按大洲排序,首先要知道哪个国家属于哪个大洲。我的例子:
continents = {
"europe": ["PL, GER"],
"south_america": ["BR"],
"north_america": ["MX"]
}
我是 python 新手,所以除了循环之外,我不知道有什么花哨的技术可以对之前的列表进行排序。
continent_data = []
for continent in continents:
for visitor_country in visitors:
for country in continents[continent]:
if visitor_country in country:
continent_data.append(continent)
print(continent_data)
之后,您可以使用前面的代码将其排序为 bin 的唯一值,并根据上面的示例创建直方图
我有 JSON 格式的数据,看起来像这样
{
"ts": 1393631983,
"visitor_uuid": "ade7e1f63bc83c66",
"visitor_source": "external",
"visitor_device": "browser",
"visitor_useragent": "Opera/9.80 (Windows NT 6.1) Presto/2.12.388 Version/12.16",
"visitor_ip": "b5af0ba608ab307c",
"visitor_country": "BR",
"visitor_referrer": "53c643c16e8253e7",
"env_type": "reader",
"env_doc_id": "140222143932-91796b01f94327ee809bd759fd0f6c76",
"event_type": "pagereadtime",
"event_readtime": 1010,
"subject_type": "doc",
"subject_doc_id": "140222143932-91796b01f94327ee809bd759fd0f6c76",
"subject_page": 3
} {
"ts": 1393631983,
"visitor_uuid": "232eeca785873d35",
"visitor_source": "internal",
"visitor_device": "browser",
"visitor_useragent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.117 Safari/537.36",
"visitor_ip": "fcf9c67037f993f0",
"visitor_country": "MX",
"visitor_referrer": "63765fcd2ff864fd",
"env_type": "stream",
"env_ranking": 10,
"env_build": "1.7.118-b946",
"env_name": "explore",
"env_component": "editors_picks",
"event_type": "impression",
"subject_type": "doc",
"subject_doc_id": "100713205147-2ee05a98f1794324952eea5ca678c026",
"subject_page": 1
}
我的任务要求我找到 subject_doc_id 与用户的输入相匹配,然后显示一个直方图,显示查看该文档的国家/地区。
我已经能够使用我的代码阅读数据,并且我也熟悉如何绘制直方图,但我需要有关如何计算国家/地区并将其显示在直方图中的帮助。
例如上面的数据“visitor_country”:“MX”和“visitor_country”:“BR”存在所以我想要每个国家的数量。
关于如何实现这一点有什么想法吗?
我不得不稍微修改您的文件内容以使其有效 JSON,然后将其另存为 'jsonExample.json' 在我的工作目录中。
修改后,json数据变成了这样的形式:
{
"visitor1": {[your data]}
"visotor2": {[your data]}
}
然后使用 json 库 (https://docs.python.org/3/library/json.html),您可以只列出每个访问者的国家/地区并计算每个访问者出现的次数:
import json
with open("jsonExample.json", 'r') as file:
contents = file.read()
visitors = json.loads(contents)
countryList = []
for v in visitors.keys():
if visitors[v]['subject_doc_id'] == "desired_subject_doc_id":
countryList.append(visitors[v]['visitor_country'])
for country in set(countryList):
print(f"Country {country} appears {countryList.count(country)} times")
if visitors[v]['subject_doc_id']
语句检查subject_doc_id
是否匹配指定值,只需将RHS替换为所需的id即可。
您的 json 文件不正确 json 文件。 您需要在文件开头添加“[”,在文件末尾添加“]”,并用逗号分隔每个“{}”部分。 这是一个例子:
Data.json
[
{
"ts": 1393631983,
"visitor_uuid": "ade7e1f63bc83c66",
"visitor_source": "external",
"visitor_device": "browser",
"visitor_useragent": "Opera/9.80 (Windows NT 6.1) Presto/2.12.388 Version/12.16",
"visitor_ip": "b5af0ba608ab307c",
"visitor_country": "BR",
"visitor_referrer": "53c643c16e8253e7",
"env_type": "reader",
"env_doc_id": "140222143932-91796b01f94327ee809bd759fd0f6c76",
"event_type": "pagereadtime",
"event_readtime": 1010,
"subject_type": "doc",
"subject_doc_id": "140222143932-91796b01f94327ee809bd759fd0f6c76",
"subject_page": 3
}, {
"ts": 1393631983,
"visitor_uuid": "232eeca785873d35",
"visitor_source": "internal",
"visitor_device": "browser",
"visitor_useragent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.117 Safari/537.36",
"visitor_ip": "fcf9c67037f993f0",
"visitor_country": "MX",
"visitor_referrer": "63765fcd2ff864fd",
"env_type": "stream",
"env_ranking": 10,
"env_build": "1.7.118-b946",
"env_name": "explore",
"env_component": "editors_picks",
"event_type": "impression",
"subject_type": "doc",
"subject_doc_id": "100713205147-2ee05a98f1794324952eea5ca678c026",
"subject_page": 1
}, {
"ts": 1393631983,
"visitor_uuid": "232eeca785873d35",
"visitor_source": "internal",
"visitor_device": "browser",
"visitor_useragent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.117 Safari/537.36",
"visitor_ip": "fcf9c67037f993f0",
"visitor_country": "PL",
"visitor_referrer": "63765fcd2ff864fd",
"env_type": "stream",
"env_ranking": 10,
"env_build": "1.7.118-b946",
"env_name": "explore",
"env_component": "editors_picks",
"event_type": "impression",
"subject_type": "doc",
"subject_doc_id": "100713205147-2ee05a98f1794324952eea5ca678c026",
"subject_page": 1
}
, {
"ts": 1393631983,
"visitor_uuid": "232eeca785873d35",
"visitor_source": "internal",
"visitor_device": "browser",
"visitor_useragent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.117 Safari/537.36",
"visitor_ip": "fcf9c67037f993f0",
"visitor_country": "PL",
"visitor_referrer": "63765fcd2ff864fd",
"env_type": "stream",
"env_ranking": 10,
"env_build": "1.7.118-b946",
"env_name": "explore",
"env_component": "editors_picks",
"event_type": "impression",
"subject_type": "doc",
"subject_doc_id": "100713205147-2ee05a98f1794324952eea5ca678c026",
"subject_page": 1
}
]
之后对于 data.json 文件中的每个元素,我正在检查它是否与我们的输入 subject_doc_id 匹配。如果我们找到匹配项,我会将其附加到匹配项列表中,这样我们就可以为直方图收集数据。之后,我想根据独特国家的数量获得一些箱子,所以我正在创建一个独特的国家列表,然后我正在检查它的长度。
import matplotlib.pyplot as plt
import json
with open("data.json") as json_file:
data = json.load(json_file)
#Here is the subject id i'm using for the data presentation
#100713205147-2ee05a98f1794324952eea5ca678c026
subject_id = input("subject_doc_id: ")
visitors = []
for i in range(len(data)):
if subject_id == data[i]["subject_doc_id"]:
print("got a match from {}".format(data[i]["visitor_country"]))
visitors.append(data[i]["visitor_country"])
countries = []
for i in visitors:
if i not in countries:
countries.append(i)
try:
plt.hist(visitors, bins = len(countries))
plt.show()
except ValueError:
print("No matches for given subject_doc_id")
如果要按大洲排序,首先要知道哪个国家属于哪个大洲。我的例子:
continents = {
"europe": ["PL, GER"],
"south_america": ["BR"],
"north_america": ["MX"]
}
我是 python 新手,所以除了循环之外,我不知道有什么花哨的技术可以对之前的列表进行排序。
continent_data = []
for continent in continents:
for visitor_country in visitors:
for country in continents[continent]:
if visitor_country in country:
continent_data.append(continent)
print(continent_data)
之后,您可以使用前面的代码将其排序为 bin 的唯一值,并根据上面的示例创建直方图