您如何从 Python 中的 google 分析拆分源媒体路径?
How do you split up source medium path from google analytics in Python?
我有来自 Google Analytics Multi-Attribute Funnel API 的一年的数据。下面的例子。 Source Medium 有不同的长度,我正在寻找为每个频道创建一个新列作为“>”分隔符的方法。
20160101 google / organic
20160101 bing / organic
20160101 google / organic > google / organic
20160101 google / organic > google / organic
20160101 (direct) / (none) > (direct) / (none)
20160101 (direct) / (none) > online.fliphtml5.com / referral
20160101 google / organic > google / organic > (direct) / (none)
20160101 google / organic > (direct) / (none) > google / organic
20160101 google / organic > online.fliphtml5.com / referral > (direct) / (none)
20160101 (direct) / (none) > (direct) / (none) > (direct) / (none)
20160101 pinterest.com / referral > (direct) / (none) > (direct) / (none)
20160101 google / organic > (direct) / (none) > (direct) / (none) > google / organic
20160101 bing / organic > (direct) / (none) > (direct) / (none) > (direct) / (none)
20160101 google / organic > (direct) / (none) > (direct) / (none) > (direct) / (none)
下面是我想要的数据格式的示例。在 Python 中如何完成?
Source_Med_Path_1 Source_Med_Path_2....Source_Med_Path_72
google / cpc direct google / organic
您可以使用 Pandas 和 apply() 函数来完成。
http://pandas.pydata.org/pandas-docs/version/0.18.1/generated/pandas.Series.apply.html
我的代码从 csv 获取源媒体,但可以轻松用于 API 结果。
import pandas as pd
def main():
#read original data from csv
data = pd.read_csv('source.csv')
#split the data on identifier >
splitdata = data['source'].apply(lambda x: pd.Series(x.split('>')))
#join the split data onto transaction data
data = pd.concat([data['transaction'], splitdata], axis=1, join_axes=[data['transaction'].index])
#loop through renaming columns
cols = ['transaction']
for i in range(len(data.columns) -1):
cols.append('Source_Med_Path_' + str(i+1))
data.columns = cols
#output data
print(data)
data.to_csv('output.csv')
if __name__ == '__main__':
main()
我有来自 Google Analytics Multi-Attribute Funnel API 的一年的数据。下面的例子。 Source Medium 有不同的长度,我正在寻找为每个频道创建一个新列作为“>”分隔符的方法。
20160101 google / organic
20160101 bing / organic
20160101 google / organic > google / organic
20160101 google / organic > google / organic
20160101 (direct) / (none) > (direct) / (none)
20160101 (direct) / (none) > online.fliphtml5.com / referral
20160101 google / organic > google / organic > (direct) / (none)
20160101 google / organic > (direct) / (none) > google / organic
20160101 google / organic > online.fliphtml5.com / referral > (direct) / (none)
20160101 (direct) / (none) > (direct) / (none) > (direct) / (none)
20160101 pinterest.com / referral > (direct) / (none) > (direct) / (none)
20160101 google / organic > (direct) / (none) > (direct) / (none) > google / organic
20160101 bing / organic > (direct) / (none) > (direct) / (none) > (direct) / (none)
20160101 google / organic > (direct) / (none) > (direct) / (none) > (direct) / (none)
下面是我想要的数据格式的示例。在 Python 中如何完成?
Source_Med_Path_1 Source_Med_Path_2....Source_Med_Path_72
google / cpc direct google / organic
您可以使用 Pandas 和 apply() 函数来完成。
http://pandas.pydata.org/pandas-docs/version/0.18.1/generated/pandas.Series.apply.html
我的代码从 csv 获取源媒体,但可以轻松用于 API 结果。
import pandas as pd
def main():
#read original data from csv
data = pd.read_csv('source.csv')
#split the data on identifier >
splitdata = data['source'].apply(lambda x: pd.Series(x.split('>')))
#join the split data onto transaction data
data = pd.concat([data['transaction'], splitdata], axis=1, join_axes=[data['transaction'].index])
#loop through renaming columns
cols = ['transaction']
for i in range(len(data.columns) -1):
cols.append('Source_Med_Path_' + str(i+1))
data.columns = cols
#output data
print(data)
data.to_csv('output.csv')
if __name__ == '__main__':
main()