将单词之间的 whitespace 减少到 1 space
reducing whitespace to 1 space between words
我有一个 Facebook 帖子列表,我已从中删除了符号系统。现在我在文本之间留下了空白 - 2 个或更多 spaces,我想将其压缩。我怎样才能去掉多余的白色space,让单词之间只有一个space?另外,如何删除文本中所有独立的大写字母?
> head(posts)
[1] "Syntel Recruitment Drive in this week for FRESHERS New Registration Link 2016 for 2013 2014 2015 Passout Graduates Qualification Any Graduate B E B Tech MCA M E M Tech Syntel Registration Link"
[2] "Dont Miss This Opportunity to be get placed in one of the best MNC companies in the world eBay freshers this week of January 2016 Qualification Any Graduate Can Apply eBay Registration Link"
[3] "Recent Pass Outs with 55 or More are eligible to Apply in Wipro Go to the Updated Link for LastDay Reference Drive Jan 2016 Apply Link for Fresher Referral Apply Link"
[4] "Robert Bosch Recruitment Drive in this week for FRESHERS New Registration Link 2016 for 2013 2014 2015 Passout Graduates Qualification Any Graduate B E B Tech MCA M E M Tech Robert Bosch Registration Link"
[5] "Mega JOB OPENINGS OF THE YEAR Mphasis Recruitment for FRESHERS January 2016 Qualification BE B Tech B Sc BCA Any Graduates MCA MBA ME M Tech Post Graduates Mphasis Registration Link"
[6] "TRIGENT Recruitment Drive in this week for FRESHERS New Registration Link 2016 for 2013 2014 2015 Passout Graduates Qualification Any Graduate B E B Tech MCA M E M Tech Trigent Registration Link"
> dput(head(posts))
c("Syntel Recruitment Drive in this week for FRESHERS New Registration Link 2016 for 2013 2014 2015 Passout Graduates Qualification Any Graduate B E B Tech MCA M E M Tech Syntel Registration Link",
"Dont Miss This Opportunity to be get placed in one of the best MNC companies in the world eBay freshers this week of January 2016 Qualification Any Graduate Can Apply eBay Registration Link",
"Recent Pass Outs with 55 or More are eligible to Apply in Wipro Go to the Updated Link for LastDay Reference Drive Jan 2016 Apply Link for Fresher Referral Apply Link",
"Robert Bosch Recruitment Drive in this week for FRESHERS New Registration Link 2016 for 2013 2014 2015 Passout Graduates Qualification Any Graduate B E B Tech MCA M E M Tech Robert Bosch Registration Link",
"Mega JOB OPENINGS OF THE YEAR Mphasis Recruitment for FRESHERS January 2016 Qualification BE B Tech B Sc BCA Any Graduates MCA MBA ME M Tech Post Graduates Mphasis Registration Link",
"TRIGENT Recruitment Drive in this week for FRESHERS New Registration Link 2016 for 2013 2014 2015 Passout Graduates Qualification Any Graduate B E B Tech MCA M E M Tech Trigent Registration Link"
)
使用gsub
,你可以试试
posts <- gsub(" +", " ", posts)
这会将每组相邻的 space 替换为单个 space。
我有一个 Facebook 帖子列表,我已从中删除了符号系统。现在我在文本之间留下了空白 - 2 个或更多 spaces,我想将其压缩。我怎样才能去掉多余的白色space,让单词之间只有一个space?另外,如何删除文本中所有独立的大写字母?
> head(posts)
[1] "Syntel Recruitment Drive in this week for FRESHERS New Registration Link 2016 for 2013 2014 2015 Passout Graduates Qualification Any Graduate B E B Tech MCA M E M Tech Syntel Registration Link"
[2] "Dont Miss This Opportunity to be get placed in one of the best MNC companies in the world eBay freshers this week of January 2016 Qualification Any Graduate Can Apply eBay Registration Link"
[3] "Recent Pass Outs with 55 or More are eligible to Apply in Wipro Go to the Updated Link for LastDay Reference Drive Jan 2016 Apply Link for Fresher Referral Apply Link"
[4] "Robert Bosch Recruitment Drive in this week for FRESHERS New Registration Link 2016 for 2013 2014 2015 Passout Graduates Qualification Any Graduate B E B Tech MCA M E M Tech Robert Bosch Registration Link"
[5] "Mega JOB OPENINGS OF THE YEAR Mphasis Recruitment for FRESHERS January 2016 Qualification BE B Tech B Sc BCA Any Graduates MCA MBA ME M Tech Post Graduates Mphasis Registration Link"
[6] "TRIGENT Recruitment Drive in this week for FRESHERS New Registration Link 2016 for 2013 2014 2015 Passout Graduates Qualification Any Graduate B E B Tech MCA M E M Tech Trigent Registration Link"
> dput(head(posts))
c("Syntel Recruitment Drive in this week for FRESHERS New Registration Link 2016 for 2013 2014 2015 Passout Graduates Qualification Any Graduate B E B Tech MCA M E M Tech Syntel Registration Link",
"Dont Miss This Opportunity to be get placed in one of the best MNC companies in the world eBay freshers this week of January 2016 Qualification Any Graduate Can Apply eBay Registration Link",
"Recent Pass Outs with 55 or More are eligible to Apply in Wipro Go to the Updated Link for LastDay Reference Drive Jan 2016 Apply Link for Fresher Referral Apply Link",
"Robert Bosch Recruitment Drive in this week for FRESHERS New Registration Link 2016 for 2013 2014 2015 Passout Graduates Qualification Any Graduate B E B Tech MCA M E M Tech Robert Bosch Registration Link",
"Mega JOB OPENINGS OF THE YEAR Mphasis Recruitment for FRESHERS January 2016 Qualification BE B Tech B Sc BCA Any Graduates MCA MBA ME M Tech Post Graduates Mphasis Registration Link",
"TRIGENT Recruitment Drive in this week for FRESHERS New Registration Link 2016 for 2013 2014 2015 Passout Graduates Qualification Any Graduate B E B Tech MCA M E M Tech Trigent Registration Link"
)
使用gsub
,你可以试试
posts <- gsub(" +", " ", posts)
这会将每组相邻的 space 替换为单个 space。