如果值在同时出现时是唯一的,而不是在它们分别出现时,您如何标记模式中的唯一出现?
How do you mark unique occurrences in a pattern given that value are unique when occurring simultaneously and not when they come separately?
假设我的数据是这样的
student article.bought
1 A pen
2 B pencil
3 V book
4 A pen
5 A inkbottle
6 B pen
7 B pencil
8 B pencil
9 V book
10 Z marker
11 A inkbottle
12 V book
13 V pen
14 V book
我需要像这样在不同专栏中出现的独特文章
student article.bought Occurences
1 A pen 1
2 B pencil 1
3 V book 1
4 A pen 1 # as A is taking a pen again
5 A inkbottle 2 # 'A' changed from pen to ink bottle
6 B pen 2
7 B pencil 3 # though B took pencil before, this is different as he took a pen in between
8 B pencil 3
9 V book 1
10 Z marker 1
11 A inkbottle 2
12 V book 1
13 V pen 2
14 V book 3
- 创建附加列 [Original Sort Order] 并从 1 开始枚举
到...
- 按学生/原始排序顺序table排序
- 在D2中输入
=IF(A2=A1,IF(B2=B1,D1,D1+1),1)
并向下复制
- 将 D 列转换为值(复制、粘贴为...值)
- 恢复原来的排列顺序
如果这不仅仅是一次性的,请使用相同的策略创建一个 VBA 脚本
在 R 中,我们可以通过找到每个后续值的差异 diff
来发现学生选择的变化。当我们对该逻辑索引求累计和 cumsum
时,我们得到 运行ning 的出现次数。
在第二行中,我们将因子变量 article.bought
强制转换为数字,并将第一行中的函数 运行 使用 ave
将函数 f
分组为学生.
f <- function(x) cumsum(c(F, diff(x) != 0)) + 1
df$Occurences <- with(df, ave(as.numeric(article.bought), student, FUN=f))
df
# student article.bought Occurences
# 1 A pen 1
# 2 B pencil 1
# 3 V book 1
# 4 A pen 1
# 5 A inkbottle 2
# 6 B pen 2
# 7 B pencil 3
# 8 B pencil 3
# 9 V book 1
# 10 Z marker 1
# 11 A inkbottle 2
# 12 V book 1
# 13 V pen 2
# 14 V book 3
使用 SAS 拍摄:
data try00;
length student article ;
infile datalines dlm=' ';
input student $ article $;
datalines;
A pen
B pencil
V book
A pen
A inkbottle
B pen
B pencil
B pencil
V book
Z marker
A inkbottle
V book
V pen
V book
;
data try01;
set try00;
pos=_n_;
run;
proc sort data=try01 out=try02; by student pos article; run;
proc sort data=try02 out=stud(keep=student) nodupkey; by student; run;
data shell;
length occurrence 8.;
set try02;
if _n_>0 then delete;
run;
%macro loopstudent();
data _null_; set stud end=eof; if eof then call symput("nstu",_n_); run;
%do i=1 %to &nstu;
data _null_; set stud; if _n_=&i then call symput("stud&i",student); run;
data thisstu;
set try02;
where student="&&stud&i";
dummyart=lag(article);
retain occurrence 0;
if dummyart ne article then occurrence=occurrence+1;
else occurrence=occurrence;
drop dummyart;
run;
proc append base=shell data=thisstu; run;
%end;
proc sort data=shell out=final; by pos; run;
%mend loopstudent; %loopstudent();
数据集"final"有结果。
假设我的数据是这样的
student article.bought
1 A pen
2 B pencil
3 V book
4 A pen
5 A inkbottle
6 B pen
7 B pencil
8 B pencil
9 V book
10 Z marker
11 A inkbottle
12 V book
13 V pen
14 V book
我需要像这样在不同专栏中出现的独特文章
student article.bought Occurences
1 A pen 1
2 B pencil 1
3 V book 1
4 A pen 1 # as A is taking a pen again
5 A inkbottle 2 # 'A' changed from pen to ink bottle
6 B pen 2
7 B pencil 3 # though B took pencil before, this is different as he took a pen in between
8 B pencil 3
9 V book 1
10 Z marker 1
11 A inkbottle 2
12 V book 1
13 V pen 2
14 V book 3
- 创建附加列 [Original Sort Order] 并从 1 开始枚举 到...
- 按学生/原始排序顺序table排序
- 在D2中输入
=IF(A2=A1,IF(B2=B1,D1,D1+1),1)
并向下复制 - 将 D 列转换为值(复制、粘贴为...值)
- 恢复原来的排列顺序
如果这不仅仅是一次性的,请使用相同的策略创建一个 VBA 脚本
在 R 中,我们可以通过找到每个后续值的差异 diff
来发现学生选择的变化。当我们对该逻辑索引求累计和 cumsum
时,我们得到 运行ning 的出现次数。
在第二行中,我们将因子变量 article.bought
强制转换为数字,并将第一行中的函数 运行 使用 ave
将函数 f
分组为学生.
f <- function(x) cumsum(c(F, diff(x) != 0)) + 1
df$Occurences <- with(df, ave(as.numeric(article.bought), student, FUN=f))
df
# student article.bought Occurences
# 1 A pen 1
# 2 B pencil 1
# 3 V book 1
# 4 A pen 1
# 5 A inkbottle 2
# 6 B pen 2
# 7 B pencil 3
# 8 B pencil 3
# 9 V book 1
# 10 Z marker 1
# 11 A inkbottle 2
# 12 V book 1
# 13 V pen 2
# 14 V book 3
使用 SAS 拍摄:
data try00;
length student article ;
infile datalines dlm=' ';
input student $ article $;
datalines;
A pen
B pencil
V book
A pen
A inkbottle
B pen
B pencil
B pencil
V book
Z marker
A inkbottle
V book
V pen
V book
;
data try01;
set try00;
pos=_n_;
run;
proc sort data=try01 out=try02; by student pos article; run;
proc sort data=try02 out=stud(keep=student) nodupkey; by student; run;
data shell;
length occurrence 8.;
set try02;
if _n_>0 then delete;
run;
%macro loopstudent();
data _null_; set stud end=eof; if eof then call symput("nstu",_n_); run;
%do i=1 %to &nstu;
data _null_; set stud; if _n_=&i then call symput("stud&i",student); run;
data thisstu;
set try02;
where student="&&stud&i";
dummyart=lag(article);
retain occurrence 0;
if dummyart ne article then occurrence=occurrence+1;
else occurrence=occurrence;
drop dummyart;
run;
proc append base=shell data=thisstu; run;
%end;
proc sort data=shell out=final; by pos; run;
%mend loopstudent; %loopstudent();
数据集"final"有结果。