Stata 从字符串中删除整个单词
Stata remove entire word from string
我有一个字符串变量,我想在其中删除某些单词,但许多其他单词可能是部分匹配,我不想删除。我想删除单词,当且仅当它们完全匹配时。
clear
* Add in some example data
input index str50 words
1 "more mor morph test"
2 "ten tennis tenner tenth keeper"
3 "badder baddy bad other"
end
* I create a copy to compare obefore/after strip
gen strip_words = words
* This is a list of words I want removed. In reality, this is a fairly long list
local removs "mor ten bad"
* For each of words, remove the complete word from teh string
foreach w of local removs {
replace strip_words = subinstr(strip_words, "`w'","", .)
}
list
+---------------------------------------------------------------+
| index words strip_words |
|---------------------------------------------------------------|
1. | 1 more mor morph test e ph test |
2. | 2 ten tennis tenner tenth keeper nis ner th keeper |
3. | 3 badder baddy bad other der dy other |
+---------------------------------------------------------------+
我试过用 replace strip_words = " " + strip_words + " "
填充一些空格,但这也删除了分隔其他单词的空格。我想要的输出是
+-------------------------------------------------------------------------+
| index words strip_words |
|-------------------------------------------------------------------------|
1. | 1 more mor morph test more morph test |
2. | 2 ten tennis tenner tenth keeper tennis tenner tenth keeper |
3. | 3 badder baddy bad other badder baddy other |
+-------------------------------------------------------------------------+
'''
参见 help string functions
subinword()
。
clear
* Add in some example data
input index str50 words
1 "more mor morph test"
2 "ten tennis tenner tenth keeper"
3 "badder baddy bad other"
end
* I create a copy to compare obefore/after strip
gen strip_words = words
* This is a list of words I want removed. In reality, this is a fairly long list
local removs "mor ten bad"
* For each of words, remove the complete word from teh string
foreach w of local removs {
replace strip_words = subinword(strip_words, "`w'","", .)
}
replace strip_words = itrim(strip_words)
使用您的示例,但使用 subinword
而不是 subinstr
您可以获得所需的输出。
clear
* Add in some example data
input index str50 words
1 "more mor morph test"
2 "ten tennis tenner tenth keeper"
3 "badder baddy bad other"
end
* I create a copy to compare obefore/after strip
gen strip_words = words
gen strip_words_2 = words
* This is a list of words I want removed. In reality, this is a fairly long list
local removs "mor ten bad"
* For each of words, remove the complete word from teh string
foreach w of local removs {
replace strip_words = subinstr(strip_words, "`w'","", .)
replace strip_words_2 = subinword(strip_words_2,"`w'","",.)
}
list
+-------------------------------------------------------------------------------------------+
| index words strip_words strip_words_2 |
|-------------------------------------------------------------------------------------------|
1. | 1 more mor morph test e ph test more morph test |
2. | 2 ten tennis tenner tenth keeper nis ner th keeper tennis tenner tenth keeper |
3. | 3 badder baddy bad other der dy other badder baddy other |
+-------------------------------------------------------------------------------------------+
这可以用正则表达式来处理。简介:link
Stata 基于 Unicode 的正则表达式命令支持 \b
指示单词边界。
clear
input index str50 words
1 "more mor morph test"
2 "ten tennis tenner tenth keeper"
3 "badder baddy bad other"
end
local rmv "(mor|ten|bad)"
gen wanted = ustrregexra(words, "\b`rmv'\b", "")
list
+----------------------------------------------------------------------+
| index words wanted |
|----------------------------------------------------------------------|
1. | 1 more mor morph test more morph test |
2. | 2 ten tennis tenner tenth keeper tennis tenner tenth keeper |
3. | 3 badder baddy bad other badder baddy other |
+----------------------------------------------------------------------+
从你的例子来看,你似乎想像上面那样保留空格。否则,您可以使用 strtrim()
和 stritrim()
.
删除它们
我有一个字符串变量,我想在其中删除某些单词,但许多其他单词可能是部分匹配,我不想删除。我想删除单词,当且仅当它们完全匹配时。
clear
* Add in some example data
input index str50 words
1 "more mor morph test"
2 "ten tennis tenner tenth keeper"
3 "badder baddy bad other"
end
* I create a copy to compare obefore/after strip
gen strip_words = words
* This is a list of words I want removed. In reality, this is a fairly long list
local removs "mor ten bad"
* For each of words, remove the complete word from teh string
foreach w of local removs {
replace strip_words = subinstr(strip_words, "`w'","", .)
}
list
+---------------------------------------------------------------+
| index words strip_words |
|---------------------------------------------------------------|
1. | 1 more mor morph test e ph test |
2. | 2 ten tennis tenner tenth keeper nis ner th keeper |
3. | 3 badder baddy bad other der dy other |
+---------------------------------------------------------------+
我试过用 replace strip_words = " " + strip_words + " "
填充一些空格,但这也删除了分隔其他单词的空格。我想要的输出是
+-------------------------------------------------------------------------+
| index words strip_words |
|-------------------------------------------------------------------------|
1. | 1 more mor morph test more morph test |
2. | 2 ten tennis tenner tenth keeper tennis tenner tenth keeper |
3. | 3 badder baddy bad other badder baddy other |
+-------------------------------------------------------------------------+
'''
参见 help string functions
subinword()
。
clear
* Add in some example data
input index str50 words
1 "more mor morph test"
2 "ten tennis tenner tenth keeper"
3 "badder baddy bad other"
end
* I create a copy to compare obefore/after strip
gen strip_words = words
* This is a list of words I want removed. In reality, this is a fairly long list
local removs "mor ten bad"
* For each of words, remove the complete word from teh string
foreach w of local removs {
replace strip_words = subinword(strip_words, "`w'","", .)
}
replace strip_words = itrim(strip_words)
使用您的示例,但使用 subinword
而不是 subinstr
您可以获得所需的输出。
clear
* Add in some example data
input index str50 words
1 "more mor morph test"
2 "ten tennis tenner tenth keeper"
3 "badder baddy bad other"
end
* I create a copy to compare obefore/after strip
gen strip_words = words
gen strip_words_2 = words
* This is a list of words I want removed. In reality, this is a fairly long list
local removs "mor ten bad"
* For each of words, remove the complete word from teh string
foreach w of local removs {
replace strip_words = subinstr(strip_words, "`w'","", .)
replace strip_words_2 = subinword(strip_words_2,"`w'","",.)
}
list
+-------------------------------------------------------------------------------------------+
| index words strip_words strip_words_2 |
|-------------------------------------------------------------------------------------------|
1. | 1 more mor morph test e ph test more morph test |
2. | 2 ten tennis tenner tenth keeper nis ner th keeper tennis tenner tenth keeper |
3. | 3 badder baddy bad other der dy other badder baddy other |
+-------------------------------------------------------------------------------------------+
这可以用正则表达式来处理。简介:link
Stata 基于 Unicode 的正则表达式命令支持 \b
指示单词边界。
clear
input index str50 words
1 "more mor morph test"
2 "ten tennis tenner tenth keeper"
3 "badder baddy bad other"
end
local rmv "(mor|ten|bad)"
gen wanted = ustrregexra(words, "\b`rmv'\b", "")
list
+----------------------------------------------------------------------+
| index words wanted |
|----------------------------------------------------------------------|
1. | 1 more mor morph test more morph test |
2. | 2 ten tennis tenner tenth keeper tennis tenner tenth keeper |
3. | 3 badder baddy bad other badder baddy other |
+----------------------------------------------------------------------+
从你的例子来看,你似乎想像上面那样保留空格。否则,您可以使用 strtrim()
和 stritrim()
.