Python Pandas groupby,行值到列 headers
Python Pandas groupby, row value to column headers
我有一个要转置的 DataFrame:
import pandas as pd
sid= '13HKQ0Ue1_YCP-pKUxFuqdiqgmW_AZeR7P3VsUwrCnZo' # spreadsheet id
gid = 0 # sheet unique id (0 equals sheet0)
url = 'https://docs.google.com/spreadsheets/d/{}/export?gid={}&format=csv'.format(sid,gid)
df = pd.read_csv(url)
我想要做的是获取 StoreName 和 CATegory 作为列 header,并为每个类别设置权重与价格。
期望的输出:
我试过循环,Pandas但无法弄清楚,
我认为它可以由 df.GroupBy 完成,但返回的 object 不是 DataFrame。
我从 API:
的 JSON 输出中得到所有这些
import pandas as pd
import json, requests
from cytoolz.dicttoolz import merge
page = requests.get(mainurl)
dict_dta = json.loads(page.text) # load in Python DICT
list_columns = ['id', 'name', 'category_name', 'ounce', 'gram', 'two_grams', 'quarter', 'eighth','half_ounce','unit','half_gram'] # get the unformatted output
df = pd.io.json.json_normalize(dict_dta, ['categories', ['items']]).pipe(lambda x: x.drop('prices', 1).join(x.prices.apply(lambda y: pd.Series(merge(y)))))[list_columns]
df.to_csv('name')
我试过很多方法。
如果有人能给我指出正确的方向,那将非常有帮助。
这个方向对吗?
import pandas as pd
sid= '13HKQ0Ue1_YCP-pKUxFuqdiqgmW_AZeR7P3VsUwrCnZo' # spreadsheet id
gid = 0 # sheet unique id (0 equals sheet0)
url = 'https://docs.google.com/spreadsheets/d/{}/export?gid={}&format=csv'.format(sid,gid)
df = pd.read_csv(url)
for idx, dfx in df.groupby(df.CAT):
if idx != 'Flower':
continue
df_test = dfx.drop(['CAT','NAME'], axis=1)
df_test = df_test.rename(columns={'StoreNAME':idx}).set_index(idx).T
df_test
Returns:
Flower Pueblo West Organics - Adult Use Pueblo West Organics - Adult Use \
UNIT NaN NaN
HALFOUNCE 15.0 50.0
EIGHTH NaN 25.0
TWOGRAMS NaN NaN
QUARTER NaN 40.0
OUNCE 30.0 69.0
GRAM NaN 9.0
Flower Pueblo West Organics - Adult Use Three Rivers Dispensary - REC \
UNIT NaN NaN
HALFOUNCE 50.0 75.0
EIGHTH 25.0 20.0
TWOGRAMS NaN NaN
QUARTER 40.0 45.0
OUNCE 69.0 125.0
GRAM 9.0 8.0
Flower Three Rivers Dispensary - REC
UNIT NaN
HALFOUNCE 75.0
EIGHTH 20.0
TWOGRAMS NaN
QUARTER 40.0
OUNCE 125.0
GRAM 8.0
我有一个要转置的 DataFrame:
import pandas as pd
sid= '13HKQ0Ue1_YCP-pKUxFuqdiqgmW_AZeR7P3VsUwrCnZo' # spreadsheet id
gid = 0 # sheet unique id (0 equals sheet0)
url = 'https://docs.google.com/spreadsheets/d/{}/export?gid={}&format=csv'.format(sid,gid)
df = pd.read_csv(url)
我想要做的是获取 StoreName 和 CATegory 作为列 header,并为每个类别设置权重与价格。
期望的输出:
我试过循环,Pandas但无法弄清楚,
我认为它可以由 df.GroupBy 完成,但返回的 object 不是 DataFrame。
我从 API:
的 JSON 输出中得到所有这些import pandas as pd
import json, requests
from cytoolz.dicttoolz import merge
page = requests.get(mainurl)
dict_dta = json.loads(page.text) # load in Python DICT
list_columns = ['id', 'name', 'category_name', 'ounce', 'gram', 'two_grams', 'quarter', 'eighth','half_ounce','unit','half_gram'] # get the unformatted output
df = pd.io.json.json_normalize(dict_dta, ['categories', ['items']]).pipe(lambda x: x.drop('prices', 1).join(x.prices.apply(lambda y: pd.Series(merge(y)))))[list_columns]
df.to_csv('name')
我试过很多方法。 如果有人能给我指出正确的方向,那将非常有帮助。
这个方向对吗?
import pandas as pd
sid= '13HKQ0Ue1_YCP-pKUxFuqdiqgmW_AZeR7P3VsUwrCnZo' # spreadsheet id
gid = 0 # sheet unique id (0 equals sheet0)
url = 'https://docs.google.com/spreadsheets/d/{}/export?gid={}&format=csv'.format(sid,gid)
df = pd.read_csv(url)
for idx, dfx in df.groupby(df.CAT):
if idx != 'Flower':
continue
df_test = dfx.drop(['CAT','NAME'], axis=1)
df_test = df_test.rename(columns={'StoreNAME':idx}).set_index(idx).T
df_test
Returns:
Flower Pueblo West Organics - Adult Use Pueblo West Organics - Adult Use \
UNIT NaN NaN
HALFOUNCE 15.0 50.0
EIGHTH NaN 25.0
TWOGRAMS NaN NaN
QUARTER NaN 40.0
OUNCE 30.0 69.0
GRAM NaN 9.0
Flower Pueblo West Organics - Adult Use Three Rivers Dispensary - REC \
UNIT NaN NaN
HALFOUNCE 50.0 75.0
EIGHTH 25.0 20.0
TWOGRAMS NaN NaN
QUARTER 40.0 45.0
OUNCE 69.0 125.0
GRAM 9.0 8.0
Flower Three Rivers Dispensary - REC
UNIT NaN
HALFOUNCE 75.0
EIGHTH 20.0
TWOGRAMS NaN
QUARTER 40.0
OUNCE 125.0
GRAM 8.0