我想路径优化用户旅行我的网站特定时间旅行特定时间在猪
i want to path optimisation user travel my website particular time in travel in particular time in pig
输入数据集:
(2012-07-21T14:00:00.000Z, joe, hxxp:///www.aaa.com/home)
(2012-07-21T14:01:00.000Z, mary, hxxp:///www.aaa.com/watch)
(2012-07-21T14:02:00.000Z, joe, hxxp:///www.aaa.com/movie)
(2012-07-21T14:01:00.000Z, mary, hxxp:///www.aaa.com/mobile)
预期输出:
(joe (hxxp:///www.aaa.com/home, hxxp:///www.aaa.com/movie))
(mary(hxxp:///www.aaa.com/watch, hxxp:///www.aaa.com/mobile))
我想在 apache pig 中做这样的路径分析项目
用户如何访问我的网站,我想优化路径
用户首先看到该网站 hxxp:///www.aaa.com/home 2 秒后他移动到 hxxp:///www.aaa.com/movie 这个页面我想分析用户旅行我网站特定旅行时间
输入:
2012-07-21T14:00:00.000Z,joe,hxxp:///www.aaa.com/home
2012-07-21T14:01:00.000Z,mary,hxxp:///www.aaa.com/watch
2012-07-21T14:02:00.000Z,joe,hxxp:///www.aaa.com/movie
2012-07-21T14:01:00.000Z,mary,hxxp:///www.aaa.com/mobile
猪脚本:
user_navigation_data = LOAD 'user_nav_data.csv' USING PigStorage(',') AS (time:datetime,user:chararray,url:chararray);
nav_data_grp_user = GROUP user_navigation_data BY user;
user_nav_stats = FOREACH nav_data_grp_user {
user_navigation_data_ord = ORDER user_navigation_data BY time;
GENERATE group AS user, BagToString(user_navigation_data_ord.url,'-->') AS urls_accessed;
};
输出:DUMP user_nav_stats:
(joe,hxxp:///www.aaa.com/home-->hxxp:///www.aaa.com/movie)
(mary,hxxp:///www.aaa.com/watch-->hxxp:///www.aaa.com/mobile)
输入数据集:
(2012-07-21T14:00:00.000Z, joe, hxxp:///www.aaa.com/home)
(2012-07-21T14:01:00.000Z, mary, hxxp:///www.aaa.com/watch)
(2012-07-21T14:02:00.000Z, joe, hxxp:///www.aaa.com/movie)
(2012-07-21T14:01:00.000Z, mary, hxxp:///www.aaa.com/mobile)
预期输出:
(joe (hxxp:///www.aaa.com/home, hxxp:///www.aaa.com/movie))
(mary(hxxp:///www.aaa.com/watch, hxxp:///www.aaa.com/mobile))
我想在 apache pig 中做这样的路径分析项目
用户如何访问我的网站,我想优化路径 用户首先看到该网站 hxxp:///www.aaa.com/home 2 秒后他移动到 hxxp:///www.aaa.com/movie 这个页面我想分析用户旅行我网站特定旅行时间
输入:
2012-07-21T14:00:00.000Z,joe,hxxp:///www.aaa.com/home
2012-07-21T14:01:00.000Z,mary,hxxp:///www.aaa.com/watch
2012-07-21T14:02:00.000Z,joe,hxxp:///www.aaa.com/movie
2012-07-21T14:01:00.000Z,mary,hxxp:///www.aaa.com/mobile
猪脚本:
user_navigation_data = LOAD 'user_nav_data.csv' USING PigStorage(',') AS (time:datetime,user:chararray,url:chararray);
nav_data_grp_user = GROUP user_navigation_data BY user;
user_nav_stats = FOREACH nav_data_grp_user {
user_navigation_data_ord = ORDER user_navigation_data BY time;
GENERATE group AS user, BagToString(user_navigation_data_ord.url,'-->') AS urls_accessed;
};
输出:DUMP user_nav_stats:
(joe,hxxp:///www.aaa.com/home-->hxxp:///www.aaa.com/movie)
(mary,hxxp:///www.aaa.com/watch-->hxxp:///www.aaa.com/mobile)