如何遵循 seaborn 的示例
How to follow seaborn's examples
每个官方 seaborn demo/example 都以 sns.load_dataset()
开头。我想知道在哪里可以获得那些 seaborn 数据集,以便我可以按照示例进行操作?
我尝试使用 "where to find official seaborn dataset" 等短语自己找到它们,但没有找到。
更新:
那么,我该如何使用它们呢?我正在关注 http://stanford.edu/~mwaskom/software/seaborn/generated/seaborn.boxplot.html, and this is the only thing that I get,也就是说,我没有得到任何图表。
我的 seaborn 和 pandas 都工作正常。它们来自我的 Anaconda 安装,并且都是最新版本。我使用的 matplotlib 版本 works fine with seaborn as well。
@gabra,我问这个问题之前在网上找到了那些csv文件,因为我认为它们只是csv文件,不能直接在sns.load_dataset(xxx)
中使用,对吗?
数据集位于另一个存储库中,名为 seaborn-data。
在此 repo 中,每个数据集都存储为 .csv
文件。
更新
试试这个:
import seaborn as sns
%matplotlib inline # To show embedded plots in the notebook
tips = sns.load_dataset("tips")
fig, ax = plt.subplots()
ax = sns.boxplot(tips["total_bill"])
seaborn 包应包括示例教程中引用的示例数据集或检索数据集的方法。
# Load the example planets dataset
planets = sns.load_dataset("planets")
当我在 example folder, I don't see the datasets. A litte exploration of the function "load_datasets" reveals that the example datasets are coming from the seaborn-data file 在线查找 "planets" 数据集并需要 pandas 包依赖项时。
def load_dataset(name, cache=True, data_home=None, **kws):
"""Load a dataset from the online repository (requires internet).
Parameters
----------
name : str
Name of the dataset (`name`.csv on
https://github.com/mwaskom/seaborn-data). You can obtain list of
available datasets using :func:`get_dataset_names`
cache : boolean, optional
If True, then cache data locally and use the cache on subsequent calls
data_home : string, optional
The directory in which to cache data. By default, uses ~/seaborn_data/
kws : dict, optional
Passed to pandas.read_csv
"""
path = "https://github.com/mwaskom/seaborn-data/raw/master/{0}.csv"
full_path = path.format(name)
if cache:
cache_path = os.path.join(get_data_home(data_home),
os.path.basename(full_path))
if not os.path.exists(cache_path):
urlretrieve(full_path, cache_path)
full_path = cache_path
df = pd.read_csv(full_path, **kws)
if df.iloc[-1].isnull().all():
df = df.iloc[:-1]
if not pandas_has_categoricals:
return df
# Set some columns as a categorical type with ordered levels
if name == "tips":
df["day"] = pd.Categorical(df["day"], ["Thur", "Fri", "Sat", "Sun"])
df["sex"] = pd.Categorical(df["sex"], ["Male", "Female"])
df["time"] = pd.Categorical(df["time"], ["Lunch", "Dinner"])
df["smoker"] = pd.Categorical(df["smoker"], ["Yes", "No"])
if name == "flights":
df["month"] = pd.Categorical(df["month"], df.month.unique())
if name == "exercise":
df["time"] = pd.Categorical(df["time"], ["1 min", "15 min", "30 min"])
df["kind"] = pd.Categorical(df["kind"], ["rest", "walking", "running"])
df["diet"] = pd.Categorical(df["diet"], ["no fat", "low fat"])
if name == "titanic":
df["class"] = pd.Categorical(df["class"], ["First", "Second", "Third"])
df["deck"] = pd.Categorical(df["deck"], list("ABCDEFG"))
return df
每个官方 seaborn demo/example 都以 sns.load_dataset()
开头。我想知道在哪里可以获得那些 seaborn 数据集,以便我可以按照示例进行操作?
我尝试使用 "where to find official seaborn dataset" 等短语自己找到它们,但没有找到。
更新:
那么,我该如何使用它们呢?我正在关注 http://stanford.edu/~mwaskom/software/seaborn/generated/seaborn.boxplot.html, and this is the only thing that I get,也就是说,我没有得到任何图表。
我的 seaborn 和 pandas 都工作正常。它们来自我的 Anaconda 安装,并且都是最新版本。我使用的 matplotlib 版本 works fine with seaborn as well。
@gabra,我问这个问题之前在网上找到了那些csv文件,因为我认为它们只是csv文件,不能直接在sns.load_dataset(xxx)
中使用,对吗?
数据集位于另一个存储库中,名为 seaborn-data。
在此 repo 中,每个数据集都存储为 .csv
文件。
更新
试试这个:
import seaborn as sns
%matplotlib inline # To show embedded plots in the notebook
tips = sns.load_dataset("tips")
fig, ax = plt.subplots()
ax = sns.boxplot(tips["total_bill"])
seaborn 包应包括示例教程中引用的示例数据集或检索数据集的方法。
# Load the example planets dataset
planets = sns.load_dataset("planets")
当我在 example folder, I don't see the datasets. A litte exploration of the function "load_datasets" reveals that the example datasets are coming from the seaborn-data file 在线查找 "planets" 数据集并需要 pandas 包依赖项时。
def load_dataset(name, cache=True, data_home=None, **kws):
"""Load a dataset from the online repository (requires internet).
Parameters
----------
name : str
Name of the dataset (`name`.csv on
https://github.com/mwaskom/seaborn-data). You can obtain list of
available datasets using :func:`get_dataset_names`
cache : boolean, optional
If True, then cache data locally and use the cache on subsequent calls
data_home : string, optional
The directory in which to cache data. By default, uses ~/seaborn_data/
kws : dict, optional
Passed to pandas.read_csv
"""
path = "https://github.com/mwaskom/seaborn-data/raw/master/{0}.csv"
full_path = path.format(name)
if cache:
cache_path = os.path.join(get_data_home(data_home),
os.path.basename(full_path))
if not os.path.exists(cache_path):
urlretrieve(full_path, cache_path)
full_path = cache_path
df = pd.read_csv(full_path, **kws)
if df.iloc[-1].isnull().all():
df = df.iloc[:-1]
if not pandas_has_categoricals:
return df
# Set some columns as a categorical type with ordered levels
if name == "tips":
df["day"] = pd.Categorical(df["day"], ["Thur", "Fri", "Sat", "Sun"])
df["sex"] = pd.Categorical(df["sex"], ["Male", "Female"])
df["time"] = pd.Categorical(df["time"], ["Lunch", "Dinner"])
df["smoker"] = pd.Categorical(df["smoker"], ["Yes", "No"])
if name == "flights":
df["month"] = pd.Categorical(df["month"], df.month.unique())
if name == "exercise":
df["time"] = pd.Categorical(df["time"], ["1 min", "15 min", "30 min"])
df["kind"] = pd.Categorical(df["kind"], ["rest", "walking", "running"])
df["diet"] = pd.Categorical(df["diet"], ["no fat", "low fat"])
if name == "titanic":
df["class"] = pd.Categorical(df["class"], ["First", "Second", "Third"])
df["deck"] = pd.Categorical(df["deck"], list("ABCDEFG"))
return df