如果值在同时出现时是唯一的,而不是在它们分别出现时,您如何标记模式中的唯一出现?

How do you mark unique occurrences in a pattern given that value are unique when occurring simultaneously and not when they come separately?

假设我的数据是这样的

   student article.bought
1        A            pen
2        B         pencil
3        V           book
4        A            pen
5        A      inkbottle
6        B            pen
7        B         pencil
8        B         pencil
9        V           book
10       Z         marker
11       A      inkbottle
12       V           book
13       V            pen
14       V           book

我需要像这样在不同专栏中出现的独特文章

   student article.bought Occurences
1        A            pen          1
2        B         pencil          1
3        V           book          1
4        A            pen          1   # as A is taking a pen again
5        A      inkbottle          2   # 'A' changed from pen to ink bottle
6        B            pen          2
7        B         pencil          3   # though B took pencil before, this is different as he took a pen in between
8        B         pencil          3
9        V           book          1
10       Z         marker          1
11       A      inkbottle          2
12       V           book          1
13       V            pen          2
14       V           book          3
  1. 创建附加列 [Original Sort Order] 并从 1 开始枚举 到...
  2. 按学生/原始排序顺序table排序
  3. 在D2中输入=IF(A2=A1,IF(B2=B1,D1,D1+1),1)并向下复制
  4. 将 D 列转换为值(复制、粘贴为...值)
  5. 恢复原来的排列顺序

如果这不仅仅是一次性的,请使用相同的策略创建一个 VBA 脚本

在 R 中,我们可以通过找到每个后续值的差异 diff 来发现学生选择的变化。当我们对该逻辑索引求累计和 cumsum 时,我们得到 运行ning 的出现次数。

在第二行中,我们将因子变量 article.bought 强制转换为数字,并将第一行中的函数 运行 使用 ave 将函数 f 分组为学生.

f <- function(x) cumsum(c(F, diff(x) != 0)) + 1
df$Occurences <- with(df, ave(as.numeric(article.bought), student, FUN=f))
df
#    student article.bought Occurences
# 1        A            pen          1
# 2        B         pencil          1
# 3        V           book          1
# 4        A            pen          1
# 5        A      inkbottle          2
# 6        B            pen          2
# 7        B         pencil          3
# 8        B         pencil          3
# 9        V           book          1
# 10       Z         marker          1
# 11       A      inkbottle          2
# 12       V           book          1
# 13       V            pen          2
# 14       V           book          3

使用 SAS 拍摄:

data try00;
length student article ;
infile datalines dlm=' ';
input student $ article $;
datalines;
A pen
B pencil 
V book 
A pen 
A inkbottle 
B pen 
B pencil 
B pencil 
V book 
Z marker 
A inkbottle
V book 
V pen 
V book
;

data try01;
set try00;
pos=_n_;
run;

proc sort data=try01 out=try02; by student pos article; run;

proc sort data=try02 out=stud(keep=student) nodupkey; by student; run;

data shell;
length occurrence 8.;
set try02;
if _n_>0 then delete;
run;

%macro loopstudent();

data _null_; set stud end=eof; if eof then call symput("nstu",_n_); run;


%do i=1 %to &nstu;
data _null_; set stud; if _n_=&i then call symput("stud&i",student); run;

data thisstu;
set try02;
where student="&&stud&i";
dummyart=lag(article);
retain occurrence 0;
if dummyart ne article then occurrence=occurrence+1;
else occurrence=occurrence;
drop dummyart;
run;

proc append base=shell data=thisstu; run;

%end;

proc sort data=shell out=final; by pos; run;

%mend loopstudent; %loopstudent();

数据集"final"有结果。