在 pandas 中创建一个数据框,其中一个索引列和第二列作为不同大小的列表,从而产生箱线图问题
Creating a dataframe in pandas with one index column and the second column as a list of different sizes creating boxplot problems
我是 Python 的初学者。我正在分析公交路线沿线每个站点的公交车发车间隔。对于每一站,我都有一个进展列表。每个站点的车头时距计数可能不同。为了可视化数据,我想在同一页上绘制箱线图,以便您可以观察公交车在路线上如何聚集。为此,我开发了一个代码,将总线数据从 .csv 文件读取到停靠站字典中,名称作为键,值作为对象(我跟踪了停靠站的其他一些方面,但为简洁起见未包含在此处)。我遇到的麻烦与箱线图有关。我认为 pandas 可以轻松地做到这一点。但是,我在尝试设置数据框时遇到了很多麻烦,因为我的字典包含对象。你可能有其他想法。我将我的代码简化到最低限度,以便您仍然可以按照我所做的去做。作为旁注,我在进行此分析时试图学习如何使用 类 。这就是为什么你在我的代码中看到一堆 类。在我的完整代码中,我用自己的方法处理重复的车辆和异常值。
stops={}
stopNamesA=[]
headwaysA=[]
class Data:
def __init__(self):
self.depart = 0
self.vehicle = 0
class Stop:
def __init__(self):
self.vehicles = []
self.departs = []
self.headways=[]
self.stopName =""
def AddData(self, line):
fields = line.split(",")
self.stopName = fields[3]
self.vehicles.append(fields[0])
x = fields[4]
self.departs.append(datetime.datetime.strptime(x[:-1], "%m/%d/%y %I:%M:%S %p"))
def CalcHeadway(self):
for i in range(len(self.departs)-1):
dt = self.departs[i]
dt2 = self.departs[i+1]
self.headways.append(datetime.timedelta.total_seconds(dt2 - dt))
with open('data.csv','r') as f:
for line in f:
fields = line.split(",")
sid = str(fields[3])
if (fields[1] == 'X2' and fields[2] == 'WEST'):
if sid in stops.keys():
s = stops[sid]
else:
s = Stop()
stops[sid] = s
s.AddData(line)
for key, value in stops.items():
value.CalcHeadway()
数据如下(我又截断了其他部分):
5401 X2 WEST H ST NW + 7TH ST NW 10/3/16 7:58:48 AM
2835 X2 WEST H ST NW + 7TH ST NW 10/3/16 8:16:49 AM
2460 X2 WEST H ST NW + 7TH ST NW 10/3/16 8:20:12 AM
2460 X2 WEST H ST NW + 7TH ST NW 10/3/16 8:20:38 AM
2460 X2 WEST H ST NW + 7TH ST NW 10/3/16 8:20:57 AM
5404 X2 WEST I ST + 14TH ST 10/3/16 8:01:55 AM
2835 X2 WEST I ST + 14TH ST 10/3/16 8:24:01 AM
2853 X2 WEST I ST + 14TH ST 10/3/16 9:27:07 AM
5404 X2 WEST I ST + 14TH ST 10/3/16 9:45:43 AM
2835 X2 WEST I ST + 14TH ST 10/3/16 9:57:31 AM
2831 X2 WEST MINNESOTA AVE NE + BENNING RD NE 10/3/16 8:02:41 AM
2821 X2 WEST MINNESOTA AVE NE + BENNING RD NE 10/3/16 8:17:42 AM
5420 X2 WEST MINNESOTA AVE NE + BENNING RD NE 10/3/16 8:34:43 AM
2853 X2 WEST MINNESOTA AVE NE + BENNING RD NE 10/3/16 8:44:14 AM
5401 X2 WEST MINNESOTA AVE NE + BENNING RD NE 10/3/16 9:02:20 AM
首先,正如错误提示的那样,'Series' object has no attribute 'boxplot'
。您可以通过 Series.plot.box()
.
从 Series
绘制箱线图
但是,由于您希望出现多个框,因此使用数据框是有意义的。所以你需要的是一个 DataFrame
来绘制你的 boxplot
。
如果我对您的需求理解正确,您需要一个有 26 列的 DataFrame
,每个公交车站一列。
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame()
df["I ST + 14TH ST"] = [1107.0, 1359.0, 1859.0, 1190.0, 1071.0, 904.0]
df["BENNING RD NE + 19TH ST NE"] = [1132.0, 1503.0, 1448.0, 1344.0, 958.0, 771.0]
#......
df["H ST NW + 5TH ST NW"] = [1182.0, 1315.0, 1691.0, 1193.0, 956.0, 729.0]
df.boxplot(rot=45)
plt.tight_layout()
plt.show()
看来,为了从 stops
字典中得到一个工作数据框,一个人可以做到。
stops_for_drawing = {}
for key, val in stops.iteritems():
stops_for_drawing.update({key: val.headways})
df = pd.DataFrame(stops_for_drawing)
我是 Python 的初学者。我正在分析公交路线沿线每个站点的公交车发车间隔。对于每一站,我都有一个进展列表。每个站点的车头时距计数可能不同。为了可视化数据,我想在同一页上绘制箱线图,以便您可以观察公交车在路线上如何聚集。为此,我开发了一个代码,将总线数据从 .csv 文件读取到停靠站字典中,名称作为键,值作为对象(我跟踪了停靠站的其他一些方面,但为简洁起见未包含在此处)。我遇到的麻烦与箱线图有关。我认为 pandas 可以轻松地做到这一点。但是,我在尝试设置数据框时遇到了很多麻烦,因为我的字典包含对象。你可能有其他想法。我将我的代码简化到最低限度,以便您仍然可以按照我所做的去做。作为旁注,我在进行此分析时试图学习如何使用 类 。这就是为什么你在我的代码中看到一堆 类。在我的完整代码中,我用自己的方法处理重复的车辆和异常值。
stops={}
stopNamesA=[]
headwaysA=[]
class Data:
def __init__(self):
self.depart = 0
self.vehicle = 0
class Stop:
def __init__(self):
self.vehicles = []
self.departs = []
self.headways=[]
self.stopName =""
def AddData(self, line):
fields = line.split(",")
self.stopName = fields[3]
self.vehicles.append(fields[0])
x = fields[4]
self.departs.append(datetime.datetime.strptime(x[:-1], "%m/%d/%y %I:%M:%S %p"))
def CalcHeadway(self):
for i in range(len(self.departs)-1):
dt = self.departs[i]
dt2 = self.departs[i+1]
self.headways.append(datetime.timedelta.total_seconds(dt2 - dt))
with open('data.csv','r') as f:
for line in f:
fields = line.split(",")
sid = str(fields[3])
if (fields[1] == 'X2' and fields[2] == 'WEST'):
if sid in stops.keys():
s = stops[sid]
else:
s = Stop()
stops[sid] = s
s.AddData(line)
for key, value in stops.items():
value.CalcHeadway()
数据如下(我又截断了其他部分):
5401 X2 WEST H ST NW + 7TH ST NW 10/3/16 7:58:48 AM
2835 X2 WEST H ST NW + 7TH ST NW 10/3/16 8:16:49 AM
2460 X2 WEST H ST NW + 7TH ST NW 10/3/16 8:20:12 AM
2460 X2 WEST H ST NW + 7TH ST NW 10/3/16 8:20:38 AM
2460 X2 WEST H ST NW + 7TH ST NW 10/3/16 8:20:57 AM
5404 X2 WEST I ST + 14TH ST 10/3/16 8:01:55 AM
2835 X2 WEST I ST + 14TH ST 10/3/16 8:24:01 AM
2853 X2 WEST I ST + 14TH ST 10/3/16 9:27:07 AM
5404 X2 WEST I ST + 14TH ST 10/3/16 9:45:43 AM
2835 X2 WEST I ST + 14TH ST 10/3/16 9:57:31 AM
2831 X2 WEST MINNESOTA AVE NE + BENNING RD NE 10/3/16 8:02:41 AM
2821 X2 WEST MINNESOTA AVE NE + BENNING RD NE 10/3/16 8:17:42 AM
5420 X2 WEST MINNESOTA AVE NE + BENNING RD NE 10/3/16 8:34:43 AM
2853 X2 WEST MINNESOTA AVE NE + BENNING RD NE 10/3/16 8:44:14 AM
5401 X2 WEST MINNESOTA AVE NE + BENNING RD NE 10/3/16 9:02:20 AM
首先,正如错误提示的那样,'Series' object has no attribute 'boxplot'
。您可以通过 Series.plot.box()
.
从 Series
绘制箱线图
但是,由于您希望出现多个框,因此使用数据框是有意义的。所以你需要的是一个 DataFrame
来绘制你的 boxplot
。
如果我对您的需求理解正确,您需要一个有 26 列的 DataFrame
,每个公交车站一列。
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame()
df["I ST + 14TH ST"] = [1107.0, 1359.0, 1859.0, 1190.0, 1071.0, 904.0]
df["BENNING RD NE + 19TH ST NE"] = [1132.0, 1503.0, 1448.0, 1344.0, 958.0, 771.0]
#......
df["H ST NW + 5TH ST NW"] = [1182.0, 1315.0, 1691.0, 1193.0, 956.0, 729.0]
df.boxplot(rot=45)
plt.tight_layout()
plt.show()
看来,为了从 stops
字典中得到一个工作数据框,一个人可以做到。
stops_for_drawing = {}
for key, val in stops.iteritems():
stops_for_drawing.update({key: val.headways})
df = pd.DataFrame(stops_for_drawing)