从 python 数据框中选择特定行以在 PANDAS 中进行 ols 回归
Selecting specific rows from a python dataframe for an ols regression in PANDAS
如何 select 来自 python 数据框的特定行,用于 PANDAS 中的 ols 回归?
我有一个包含 1,000 行的 pandas 数据框。对于前 10 行,我想在 B + C 列上回归 A 列。当我输入:
mod = pd.ols(y=df[‘A’], x=df[[‘B’,’C’]], window=10)
我得到了第 991-1000 行的回归结果。如何指定我想要第一个(或第二个等)10 行?
提前致谢。
我想你可以使用 iloc
:
mod = pd.ols(y=df['A'].iloc[2:12], x=df[['B','C']].iloc[2:12], window=10)
或ix
:
mod = pd.ols(y=df.ix[2:12, 'A'], x=df.ix[2:12, ['B', 'C']], window=10)
如果您需要所有组,请使用 range
:
for i in range(10):
#print i, i+10
mod = pd.ols(y=df['A'].iloc[i:i + 10], x=df[['B','C']].iloc[i:i + 10], window=10)
如果您需要有关 ols
的帮助,请尝试 IPython
中的 help(pd.ols)
,因为缺少 pandas 文档中的此功能:
In [79]: help(pd.ols)
Help on function ols in module pandas.stats.interface:
ols(**kwargs)
Returns the appropriate OLS object depending on whether you need
simple or panel OLS, and a full-sample or rolling/expanding OLS.
Will be a normal linear regression or a (pooled) panel regression depending
on the type of the inputs:
y : Series, x : DataFrame -> OLS
y : Series, x : dict of DataFrame -> OLS
y : DataFrame, x : DataFrame -> PanelOLS
y : DataFrame, x : dict of DataFrame/Panel -> PanelOLS
y : Series with MultiIndex, x : Panel/DataFrame + MultiIndex -> PanelOLS
Parameters
----------
y: Series or DataFrame
See above for types
x: Series, DataFrame, dict of Series, dict of DataFrame, Panel
weights : Series or ndarray
The weights are presumed to be (proportional to) the inverse of the
variance of the observations. That is, if the variables are to be
transformed by 1/sqrt(W) you must supply weights = 1/W
intercept: bool
True if you want an intercept. Defaults to True.
nw_lags: None or int
Number of Newey-West lags. Defaults to None.
nw_overlap: bool
Whether there are overlaps in the NW lags. Defaults to False.
window_type: {'full sample', 'rolling', 'expanding'}
'full sample' by default
window: int
size of window (for rolling/expanding OLS). If window passed and no
explicit window_type, 'rolling" will be used as the window_type
Panel OLS options:
pool: bool
Whether to run pooled panel regression. Defaults to true.
entity_effects: bool
Whether to account for entity fixed effects. Defaults to false.
time_effects: bool
Whether to account for time fixed effects. Defaults to false.
x_effects: list
List of x's to account for fixed effects. Defaults to none.
dropped_dummies: dict
Key is the name of the variable for the fixed effect.
Value is the value of that variable for which we drop the dummy.
For entity fixed effects, key equals 'entity'.
By default, the first dummy is dropped if no dummy is specified.
cluster: {'time', 'entity'}
cluster variances
Examples
--------
# Run simple OLS.
result = ols(y=y, x=x)
# Run rolling simple OLS with window of size 10.
result = ols(y=y, x=x, window_type='rolling', window=10)
print(result.beta)
result = ols(y=y, x=x, nw_lags=1)
# Set up LHS and RHS for data across all items
y = A
x = {'B' : B, 'C' : C}
# Run panel OLS.
result = ols(y=y, x=x)
# Run expanding panel OLS with window 10 and entity clustering.
result = ols(y=y, x=x, cluster='entity', window_type='expanding', window=10)
Returns
-------
The appropriate OLS object, which allows you to obtain betas and various
statistics, such as std err, t-stat, etc.
如何 select 来自 python 数据框的特定行,用于 PANDAS 中的 ols 回归?
我有一个包含 1,000 行的 pandas 数据框。对于前 10 行,我想在 B + C 列上回归 A 列。当我输入:
mod = pd.ols(y=df[‘A’], x=df[[‘B’,’C’]], window=10)
我得到了第 991-1000 行的回归结果。如何指定我想要第一个(或第二个等)10 行?
提前致谢。
我想你可以使用 iloc
:
mod = pd.ols(y=df['A'].iloc[2:12], x=df[['B','C']].iloc[2:12], window=10)
或ix
:
mod = pd.ols(y=df.ix[2:12, 'A'], x=df.ix[2:12, ['B', 'C']], window=10)
如果您需要所有组,请使用 range
:
for i in range(10):
#print i, i+10
mod = pd.ols(y=df['A'].iloc[i:i + 10], x=df[['B','C']].iloc[i:i + 10], window=10)
如果您需要有关 ols
的帮助,请尝试 IPython
中的 help(pd.ols)
,因为缺少 pandas 文档中的此功能:
In [79]: help(pd.ols)
Help on function ols in module pandas.stats.interface:
ols(**kwargs)
Returns the appropriate OLS object depending on whether you need
simple or panel OLS, and a full-sample or rolling/expanding OLS.
Will be a normal linear regression or a (pooled) panel regression depending
on the type of the inputs:
y : Series, x : DataFrame -> OLS
y : Series, x : dict of DataFrame -> OLS
y : DataFrame, x : DataFrame -> PanelOLS
y : DataFrame, x : dict of DataFrame/Panel -> PanelOLS
y : Series with MultiIndex, x : Panel/DataFrame + MultiIndex -> PanelOLS
Parameters
----------
y: Series or DataFrame
See above for types
x: Series, DataFrame, dict of Series, dict of DataFrame, Panel
weights : Series or ndarray
The weights are presumed to be (proportional to) the inverse of the
variance of the observations. That is, if the variables are to be
transformed by 1/sqrt(W) you must supply weights = 1/W
intercept: bool
True if you want an intercept. Defaults to True.
nw_lags: None or int
Number of Newey-West lags. Defaults to None.
nw_overlap: bool
Whether there are overlaps in the NW lags. Defaults to False.
window_type: {'full sample', 'rolling', 'expanding'}
'full sample' by default
window: int
size of window (for rolling/expanding OLS). If window passed and no
explicit window_type, 'rolling" will be used as the window_type
Panel OLS options:
pool: bool
Whether to run pooled panel regression. Defaults to true.
entity_effects: bool
Whether to account for entity fixed effects. Defaults to false.
time_effects: bool
Whether to account for time fixed effects. Defaults to false.
x_effects: list
List of x's to account for fixed effects. Defaults to none.
dropped_dummies: dict
Key is the name of the variable for the fixed effect.
Value is the value of that variable for which we drop the dummy.
For entity fixed effects, key equals 'entity'.
By default, the first dummy is dropped if no dummy is specified.
cluster: {'time', 'entity'}
cluster variances
Examples
--------
# Run simple OLS.
result = ols(y=y, x=x)
# Run rolling simple OLS with window of size 10.
result = ols(y=y, x=x, window_type='rolling', window=10)
print(result.beta)
result = ols(y=y, x=x, nw_lags=1)
# Set up LHS and RHS for data across all items
y = A
x = {'B' : B, 'C' : C}
# Run panel OLS.
result = ols(y=y, x=x)
# Run expanding panel OLS with window 10 and entity clustering.
result = ols(y=y, x=x, cluster='entity', window_type='expanding', window=10)
Returns
-------
The appropriate OLS object, which allows you to obtain betas and various
statistics, such as std err, t-stat, etc.