如何根据现有字符串变量的子字符串在 Stata 中生成虚拟变量?
How to generate a dummy variable in Stata based on a sub-string of an existing string variable?
我正在寻找一种创建虚拟变量的方法,该虚拟变量根据多个给定子字符串(如“book、buy、journey”)检查名为 text 的变量。
现在,我想检查观察中是否有预订、购买或旅程。如果在子字符串中找到这些关键字之一,则虚拟变量应为 1,否则为 0。
一个例子:
TEXT
Book your tickets now
Swiss is making your journey easy
Buy your holiday tickets now!
A touch of Austria in your lungs.
期望的结果应该是
dummy variable
1
1
1
0
我用 strpos 和 regexm 试过了,结果非常有限。
此致,
乔希
使用strpos
可能会很乏味,因为你必须考虑大小写,所以我会使用正则表达式。
* Example generated by -dataex-. To install: ssc install dataex
clear
input str33 text
"Book your tickets now"
"Swiss is making your journey easy"
"Buy your holiday tickets now!"
"A touch of Austria in your lungs."
end
generate wanted = regexm(text, "[Bb]ook|[Bb]uy|[Jj]ourney")
list
结果:
. list
+--------------------------------------------+
| text wanted |
|--------------------------------------------|
1. | Book your tickets now 1 |
2. | Swiss is making your journey easy 1 |
3. | Buy your holiday tickets now! 1 |
4. | A touch of Austria in your lungs. 0 |
+--------------------------------------------+
另请参阅此 link 了解有关正则表达式的信息。
我正在寻找一种创建虚拟变量的方法,该虚拟变量根据多个给定子字符串(如“book、buy、journey”)检查名为 text 的变量。
现在,我想检查观察中是否有预订、购买或旅程。如果在子字符串中找到这些关键字之一,则虚拟变量应为 1,否则为 0。 一个例子:
TEXT
Book your tickets now
Swiss is making your journey easy
Buy your holiday tickets now!
A touch of Austria in your lungs.
期望的结果应该是
dummy variable
1
1
1
0
我用 strpos 和 regexm 试过了,结果非常有限。
此致,
乔希
使用strpos
可能会很乏味,因为你必须考虑大小写,所以我会使用正则表达式。
* Example generated by -dataex-. To install: ssc install dataex
clear
input str33 text
"Book your tickets now"
"Swiss is making your journey easy"
"Buy your holiday tickets now!"
"A touch of Austria in your lungs."
end
generate wanted = regexm(text, "[Bb]ook|[Bb]uy|[Jj]ourney")
list
结果:
. list
+--------------------------------------------+
| text wanted |
|--------------------------------------------|
1. | Book your tickets now 1 |
2. | Swiss is making your journey easy 1 |
3. | Buy your holiday tickets now! 1 |
4. | A touch of Austria in your lungs. 0 |
+--------------------------------------------+
另请参阅此 link 了解有关正则表达式的信息。