在 pandas to_stata 中设置索引标签
Set index label in pandas to_stata
我正在尝试将 pandas DataFrame 保存到 Stata 格式的文件中。更具体地说,DataFrame的索引需要保存到索引所在列的header必须有一个spacific text:换句话说我需要设置一个索引标签。
Pandas' to_csv has an index_label
option, but pandas' to_stata 函数没有 index_label
选项。
保存为 Stata 格式时如何设置索引标签?
具有已经具有索引名称的 DataFrame(案例 1)与事先未设置索引名称()之间存在细微差别Case2).
案例 1:index_name 已经到位(使用.set_index
)
import pandas as pd
#data
data = [['Eren Jaeger', 15,'Soldier' ] , ['Mikasa Ackerman', 14,'Soldier'], ['Armin Arlert', 14,'Soldier'],['Levi Ackerman', 30, 'Captain']]
#creating DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age', 'Rank'])
#setting index_name based on a previous variable
df = df.set_index('Rank', drop=True)
#creating dta file (no need of .rename_axis(index='my_index'))
df.to_stata('stata_df_1.dta' )
df
## Name Age
## Rank
## Soldier Eren Jaeger 15
## Soldier Mikasa Ackerman 14
## Soldier Armin Arlert 14
## Captain Levi Ackerman 30
情况 2:索引未命名(需要 .rename_axis(index='my_index')
)
根据@QuangHoang 的评论,这是一种在没有预先命名的情况下设置索引名称的方法。
data = [['Eren Jaeger', 15] , ['Mikasa Ackerman', 14], ['Armin Arlert', 14],['Levi Ackerman', 30]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])
df
## Name Age
## 0 Eren Jaeger 15
## 1 Mikasa Ackerman 14
## 2 Armin Arlert 14
## 3 Levi Ackerman 30
#this will have a first variable with digits 1 to 4 called "index" (default)
df.to_stata('stata_df_no_name.dta' )
#this will have a first variable with digits 1 to 4 called "my_index"
df.rename_axis(index='my_index').to_stata('stata_df_2.dta')
我正在尝试将 pandas DataFrame 保存到 Stata 格式的文件中。更具体地说,DataFrame的索引需要保存到索引所在列的header必须有一个spacific text:换句话说我需要设置一个索引标签。
Pandas' to_csv has an index_label
option, but pandas' to_stata 函数没有 index_label
选项。
保存为 Stata 格式时如何设置索引标签?
具有已经具有索引名称的 DataFrame(案例 1)与事先未设置索引名称()之间存在细微差别Case2).
案例 1:index_name 已经到位(使用.set_index
)
import pandas as pd
#data
data = [['Eren Jaeger', 15,'Soldier' ] , ['Mikasa Ackerman', 14,'Soldier'], ['Armin Arlert', 14,'Soldier'],['Levi Ackerman', 30, 'Captain']]
#creating DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age', 'Rank'])
#setting index_name based on a previous variable
df = df.set_index('Rank', drop=True)
#creating dta file (no need of .rename_axis(index='my_index'))
df.to_stata('stata_df_1.dta' )
df
## Name Age
## Rank
## Soldier Eren Jaeger 15
## Soldier Mikasa Ackerman 14
## Soldier Armin Arlert 14
## Captain Levi Ackerman 30
情况 2:索引未命名(需要 .rename_axis(index='my_index')
)
根据@QuangHoang 的评论,这是一种在没有预先命名的情况下设置索引名称的方法。
data = [['Eren Jaeger', 15] , ['Mikasa Ackerman', 14], ['Armin Arlert', 14],['Levi Ackerman', 30]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])
df
## Name Age
## 0 Eren Jaeger 15
## 1 Mikasa Ackerman 14
## 2 Armin Arlert 14
## 3 Levi Ackerman 30
#this will have a first variable with digits 1 to 4 called "index" (default)
df.to_stata('stata_df_no_name.dta' )
#this will have a first variable with digits 1 to 4 called "my_index"
df.rename_axis(index='my_index').to_stata('stata_df_2.dta')