pandas 添加行而不是列
pandas add row instead of column
我是 pandas 的新手,但我想简单地添加一行
class Security:
def __init__(self):
self.structure = ['timestamp', 'open', 'high', 'low', 'close', 'vol']
self.df = pd.DataFrame(columns=self.structure) # index =
def whats_inside(self):
return self.df
"""
Some skipped code...
"""
def add_data(self, timestamp, open, high, low, close, vol):
data = [timestamp, open, high, low, close, vol]
self.df = self.df.append (data)
sec = Security()
print sec.whats_inside()
sec.add_data ('2015/06/01', '1', '2', '0.5', '1', '100')
print sec.whats_inside()
但输出是:
0 close high low open timestamp vol
0 2015/06/01 NaN NaN NaN NaN NaN NaN
1 1 NaN NaN NaN NaN NaN NaN
2 2 NaN NaN NaN NaN NaN NaN
3 0.5 NaN NaN NaN NaN NaN NaN
4 1 NaN NaN NaN NaN NaN NaN
5 100 NaN NaN NaN NaN NaN NaN
这意味着,我要添加一列而不是行。是的,我已经尝试 google 但仍然没有明白如何使它成为简单的 pythonic 方式。
p.s。我知道这很简单,但我只是遗漏了一些重要的东西。
添加新行的方法有多种。也许最简单的方法是(如果您想将行添加到末尾)是使用 loc
:
df.loc[len(df)] = ['val_a', 'val_b', .... ]
loc
需要一个索引。 len(df)
将 return 数据框中的行数,因此新行将添加到数据框的末尾。
'['val_a', 'val_b', .... ]' 是行的值列表,与列的顺序相同,因此列表的长度必须等于列数,否则你会得到一个 ValueError
异常。
一个例外是,如果您希望所有列都具有相同的值,您可以将该值作为列表中的单个元素,例如 df.loc[len(df)] = ['aa']
.
注意: 一个好主意是在使用此方法之前始终使用 reset_index
,因为如果您曾经删除行或处理过滤后的数据框,则您不会保证行的索引将与行数同步。
您应该附加 Series 或 DataFrame。 (系列更适合你的情况)
import pandas as pd
from pandas import Series, DataFrame
class Security:
def __init__(self):
self.structure = ['timestamp', 'open', 'high', 'low', 'close', 'vol']
self.df = pd.DataFrame(columns=self.structure) # index =
def whats_inside(self):
return self.df
"""
Some skipped code...
"""
def add_data(self, timestamp, open, high, low, close, vol):
data = [timestamp, open, high, low, close, vol]
# append Series
self.df = self.df.append(pd.Series(data, index=self.structure), ignore_index=True)
# or DataFrame
# self.df = self.df.append(pd.DataFrame([data], columns=self.structure), ignore_index=True)
sec = Security()
print sec.whats_inside()
sec.add_data ('2015/06/01', '1', '2', '0.5', '1', '100')
sec.add_data ('2015/06/02', '1', '2', '0.5', '1', '100')
print sec.whats_inside()
输出:
timestamp open high low close vol
0 2015/06/01 1 2 0.5 1 100
1 2015/06/02 1 2 0.5 1 100
我是 pandas 的新手,但我想简单地添加一行
class Security:
def __init__(self):
self.structure = ['timestamp', 'open', 'high', 'low', 'close', 'vol']
self.df = pd.DataFrame(columns=self.structure) # index =
def whats_inside(self):
return self.df
"""
Some skipped code...
"""
def add_data(self, timestamp, open, high, low, close, vol):
data = [timestamp, open, high, low, close, vol]
self.df = self.df.append (data)
sec = Security()
print sec.whats_inside()
sec.add_data ('2015/06/01', '1', '2', '0.5', '1', '100')
print sec.whats_inside()
但输出是:
0 close high low open timestamp vol
0 2015/06/01 NaN NaN NaN NaN NaN NaN
1 1 NaN NaN NaN NaN NaN NaN
2 2 NaN NaN NaN NaN NaN NaN
3 0.5 NaN NaN NaN NaN NaN NaN
4 1 NaN NaN NaN NaN NaN NaN
5 100 NaN NaN NaN NaN NaN NaN
这意味着,我要添加一列而不是行。是的,我已经尝试 google 但仍然没有明白如何使它成为简单的 pythonic 方式。
p.s。我知道这很简单,但我只是遗漏了一些重要的东西。
添加新行的方法有多种。也许最简单的方法是(如果您想将行添加到末尾)是使用 loc
:
df.loc[len(df)] = ['val_a', 'val_b', .... ]
loc
需要一个索引。 len(df)
将 return 数据框中的行数,因此新行将添加到数据框的末尾。
'['val_a', 'val_b', .... ]' 是行的值列表,与列的顺序相同,因此列表的长度必须等于列数,否则你会得到一个 ValueError
异常。
一个例外是,如果您希望所有列都具有相同的值,您可以将该值作为列表中的单个元素,例如 df.loc[len(df)] = ['aa']
.
注意: 一个好主意是在使用此方法之前始终使用 reset_index
,因为如果您曾经删除行或处理过滤后的数据框,则您不会保证行的索引将与行数同步。
您应该附加 Series 或 DataFrame。 (系列更适合你的情况)
import pandas as pd
from pandas import Series, DataFrame
class Security:
def __init__(self):
self.structure = ['timestamp', 'open', 'high', 'low', 'close', 'vol']
self.df = pd.DataFrame(columns=self.structure) # index =
def whats_inside(self):
return self.df
"""
Some skipped code...
"""
def add_data(self, timestamp, open, high, low, close, vol):
data = [timestamp, open, high, low, close, vol]
# append Series
self.df = self.df.append(pd.Series(data, index=self.structure), ignore_index=True)
# or DataFrame
# self.df = self.df.append(pd.DataFrame([data], columns=self.structure), ignore_index=True)
sec = Security()
print sec.whats_inside()
sec.add_data ('2015/06/01', '1', '2', '0.5', '1', '100')
sec.add_data ('2015/06/02', '1', '2', '0.5', '1', '100')
print sec.whats_inside()
输出:
timestamp open high low close vol
0 2015/06/01 1 2 0.5 1 100
1 2015/06/02 1 2 0.5 1 100