如何用换行符替换第一个空格 space 后跟大写字母?
How can I replace the first blank space followed by a capital letter with a line break?
所以我有这段文字(它有一千多行):
ABO blood group antigens Carbohydrate antigens attached mainly to cell surface proteins or lipids that are present on many cell types, including red blood cells. These antigens differ among individuals, depending on inherited alleles encoding the enzymes required for synthesis of the carbohydrate antigens. The ABO antigens act as alloantigens that are responsible for blood transfusion reactions and hyperacute rejection of allografts.
Acquired immunodeficiency A deficiency in the immune system that is acquired after birth, usually because of infection (e.g., AIDS), and that is not related to a genetic defect. Synonymous with secondary immunodeficiency.
Acquired immunodeficiency syndrome (AIDS) A disease caused by human immunodeficiency virus (HIV) infection that is characterized by depletion of CD4+ T cells, leading to a profound defect in cell-mediated immunity. Clinically, AIDS includes opportunistic infections, malignant tumors, wasting, and encephalopathy.
Activation-induced cell death (AICD) Apoptosis of activated lymphocytes, generally used for T cells.
Activation-induced (cytidine) deaminase (AID) An enzyme expressed in B cells that catalyzes the conversion of cytosine into uracil in DNA, which is a step required for somatic hypermutation and affinity maturation of antibodies and for Ig class switching.
Activation protein 1 (AP-1) A family of DNA-binding transcription factors composed of dimers of two proteins that bind to one another through a shared structural motif called a leucine zipper. The best-characterized AP-1 factor is composed of the proteins Fos and Jun. AP-1 is involved in transcriptional regulation of many different genes that are important in the immune system, such as cytokine genes.
我希望它是这样的:
ABO blood group antigens
Carbohydrate antigens attached mainly to cell surface proteins or lipids that are present on many cell types, including red blood cells. These antigens differ among individuals, depending on inherited alleles encoding the enzymes required for synthesis of the carbohydrate antigens. The ABO antigens act as alloantigens that are responsible for blood transfusion reactions and hyperacute rejection of allografts.
Acquired immunodeficiency
A deficiency in the immune system that is acquired after birth, usually because of infection (e.g., AIDS), and that is not related to a genetic defect. Synonymous with secondary immunodeficiency.
Acquired immunodeficiency syndrome (AIDS)
A disease caused by human immunodeficiency virus (HIV) infection that is characterized by depletion of CD4+ T cells, leading to a profound defect in cell-mediated immunity. Clinically, AIDS includes opportunistic infections, malignant tumors, wasting, and encephalopathy.
Activation-induced cell death (AICD)
Apoptosis of activated lymphocytes, generally used for T cells.
Activation-induced (cytidine) deaminase (AID)
An enzyme expressed in B cells that catalyzes the conversion of cytosine into uracil in DNA, which is a step required for somatic hypermutation and affinity maturation of antibodies and for Ig class switching.
Activation protein 1 (AP-1)
A family of DNA-binding transcription factors composed of dimers of two proteins that bind to one another through a shared structural motif called a leucine zipper. The best-characterized AP-1 factor is composed of the proteins Fos and Jun. AP-1 is involved in transcriptional regulation of many different genes that are important in the immune system, such as cytokine genes.
有办法解决吗?我不是程序员。谢谢。
我在本地测试了您的文本并且有效,我不是正则表达式专家所以它可能不是最有效的。
使用 'Replace' 选项卡 (Ctrl+H
):
查找内容:^(.*?) ([A-Z].*$)
替换为:\r\n
确保选中大小写匹配和正则表达式。
解释:
查找内容:
^ starts with
. anything
* repeated 0 or more times
? lazy match so that it stops at the capital letter (next group)
(.*?) remember that part (group 1)
followed by a space
[A-Z] match the capital letter
. anything
* repeated 0 or more times
$ ends with
([A-Z].*$) remember that part (group 2)
替换为
group 1
\r carriage return
\n new line
group 2
您需要使用 regular expressions 进行替换(寻找 space 后跟大写字母)。
在记事本++中使用find/replace和正则表达式(一定要勾选"Match Case")
查找内容:([^.]) ([A-Z])
替换为:\r\n
是的,
使用 perl 脚本。我觉得这个很管用...
#!/usr/bin/perl
$cestbon = 0;
while (<>) {
@line = split(" ",$_);
if (/^$/) {
$cestbon = 0;
print "\n";
}
foreach (@line) {
if (/\b[A-Z][a-z0-9]*\b/ && $cestbon < 2) {
print "\n$_ ";
$cestbon++;
} else {
print "$_ ";
}
}
}
要运行啦!因为这是在 MBP 上,运行ning OS X aka UNIX.
猫sample.txt | ./sample.pl
ABO blood group antigens
Carbohydrate antigens attached mainly to cell surface proteins or lipids that are present on many cell types, including red blood cells.
These antigens differ among individuals, depending on inherited alleles encoding the enzymes required for synthesis of the carbohydrate antigens. The ABO antigens act as alloantigens that are responsible for blood transfusion reactions and hyperacute rejection of allografts.
Acquired immunodeficiency
A deficiency in the immune system that is acquired after birth, usually because of infection (e.g., AIDS), and that is not related to a genetic defect. Synonymous with secondary immunodeficiency.
Acquired immunodeficiency syndrome (AIDS)
A disease caused by human immunodeficiency virus (HIV) infection that is characterized by depletion of CD4+ T cells, leading to a profound defect in cell-mediated immunity. Clinically, AIDS includes opportunistic infections, malignant tumors, wasting, and encephalopathy.
Activation-induced cell death (AICD)
Apoptosis of activated lymphocytes, generally used for T cells.
Activation-induced (cytidine) deaminase (AID)
An enzyme expressed in B cells that catalyzes the conversion of cytosine into uracil in DNA, which is a step required for somatic hypermutation and affinity maturation of antibodies and for Ig class switching.
Activation protein 1 (AP-1)
A family of DNA-binding transcription factors composed of dimers of two proteins that bind to one another through a shared structural motif called a leucine zipper. The best-characterized AP-1 factor is composed of the proteins Fos and Jun. AP-1 is involved in transcriptional regulation of many different genes that are important in the immune system, such as cytokine genes.
可能不完美,但我在 10 分钟内写完了,所以请休息一下:)
所以我有这段文字(它有一千多行):
ABO blood group antigens Carbohydrate antigens attached mainly to cell surface proteins or lipids that are present on many cell types, including red blood cells. These antigens differ among individuals, depending on inherited alleles encoding the enzymes required for synthesis of the carbohydrate antigens. The ABO antigens act as alloantigens that are responsible for blood transfusion reactions and hyperacute rejection of allografts.
Acquired immunodeficiency A deficiency in the immune system that is acquired after birth, usually because of infection (e.g., AIDS), and that is not related to a genetic defect. Synonymous with secondary immunodeficiency.
Acquired immunodeficiency syndrome (AIDS) A disease caused by human immunodeficiency virus (HIV) infection that is characterized by depletion of CD4+ T cells, leading to a profound defect in cell-mediated immunity. Clinically, AIDS includes opportunistic infections, malignant tumors, wasting, and encephalopathy.
Activation-induced cell death (AICD) Apoptosis of activated lymphocytes, generally used for T cells.
Activation-induced (cytidine) deaminase (AID) An enzyme expressed in B cells that catalyzes the conversion of cytosine into uracil in DNA, which is a step required for somatic hypermutation and affinity maturation of antibodies and for Ig class switching.
Activation protein 1 (AP-1) A family of DNA-binding transcription factors composed of dimers of two proteins that bind to one another through a shared structural motif called a leucine zipper. The best-characterized AP-1 factor is composed of the proteins Fos and Jun. AP-1 is involved in transcriptional regulation of many different genes that are important in the immune system, such as cytokine genes.
我希望它是这样的:
ABO blood group antigens
Carbohydrate antigens attached mainly to cell surface proteins or lipids that are present on many cell types, including red blood cells. These antigens differ among individuals, depending on inherited alleles encoding the enzymes required for synthesis of the carbohydrate antigens. The ABO antigens act as alloantigens that are responsible for blood transfusion reactions and hyperacute rejection of allografts.
Acquired immunodeficiency
A deficiency in the immune system that is acquired after birth, usually because of infection (e.g., AIDS), and that is not related to a genetic defect. Synonymous with secondary immunodeficiency.
Acquired immunodeficiency syndrome (AIDS)
A disease caused by human immunodeficiency virus (HIV) infection that is characterized by depletion of CD4+ T cells, leading to a profound defect in cell-mediated immunity. Clinically, AIDS includes opportunistic infections, malignant tumors, wasting, and encephalopathy.
Activation-induced cell death (AICD)
Apoptosis of activated lymphocytes, generally used for T cells.
Activation-induced (cytidine) deaminase (AID)
An enzyme expressed in B cells that catalyzes the conversion of cytosine into uracil in DNA, which is a step required for somatic hypermutation and affinity maturation of antibodies and for Ig class switching.
Activation protein 1 (AP-1)
A family of DNA-binding transcription factors composed of dimers of two proteins that bind to one another through a shared structural motif called a leucine zipper. The best-characterized AP-1 factor is composed of the proteins Fos and Jun. AP-1 is involved in transcriptional regulation of many different genes that are important in the immune system, such as cytokine genes.
有办法解决吗?我不是程序员。谢谢。
我在本地测试了您的文本并且有效,我不是正则表达式专家所以它可能不是最有效的。
使用 'Replace' 选项卡 (Ctrl+H
):
查找内容:^(.*?) ([A-Z].*$)
替换为:\r\n
确保选中大小写匹配和正则表达式。
解释:
查找内容:
^ starts with
. anything
* repeated 0 or more times
? lazy match so that it stops at the capital letter (next group)
(.*?) remember that part (group 1)
followed by a space
[A-Z] match the capital letter
. anything
* repeated 0 or more times
$ ends with
([A-Z].*$) remember that part (group 2)
替换为
group 1
\r carriage return
\n new line
group 2
您需要使用 regular expressions 进行替换(寻找 space 后跟大写字母)。
在记事本++中使用find/replace和正则表达式(一定要勾选"Match Case")
查找内容:([^.]) ([A-Z]) 替换为:\r\n
是的,
使用 perl 脚本。我觉得这个很管用...
#!/usr/bin/perl
$cestbon = 0;
while (<>) {
@line = split(" ",$_);
if (/^$/) {
$cestbon = 0;
print "\n";
}
foreach (@line) {
if (/\b[A-Z][a-z0-9]*\b/ && $cestbon < 2) {
print "\n$_ ";
$cestbon++;
} else {
print "$_ ";
}
}
}
要运行啦!因为这是在 MBP 上,运行ning OS X aka UNIX.
猫sample.txt | ./sample.pl
ABO blood group antigens
Carbohydrate antigens attached mainly to cell surface proteins or lipids that are present on many cell types, including red blood cells.
These antigens differ among individuals, depending on inherited alleles encoding the enzymes required for synthesis of the carbohydrate antigens. The ABO antigens act as alloantigens that are responsible for blood transfusion reactions and hyperacute rejection of allografts.
Acquired immunodeficiency
A deficiency in the immune system that is acquired after birth, usually because of infection (e.g., AIDS), and that is not related to a genetic defect. Synonymous with secondary immunodeficiency.
Acquired immunodeficiency syndrome (AIDS)
A disease caused by human immunodeficiency virus (HIV) infection that is characterized by depletion of CD4+ T cells, leading to a profound defect in cell-mediated immunity. Clinically, AIDS includes opportunistic infections, malignant tumors, wasting, and encephalopathy.
Activation-induced cell death (AICD)
Apoptosis of activated lymphocytes, generally used for T cells.
Activation-induced (cytidine) deaminase (AID)
An enzyme expressed in B cells that catalyzes the conversion of cytosine into uracil in DNA, which is a step required for somatic hypermutation and affinity maturation of antibodies and for Ig class switching.
Activation protein 1 (AP-1)
A family of DNA-binding transcription factors composed of dimers of two proteins that bind to one another through a shared structural motif called a leucine zipper. The best-characterized AP-1 factor is composed of the proteins Fos and Jun. AP-1 is involved in transcriptional regulation of many different genes that are important in the immune system, such as cytokine genes.
可能不完美,但我在 10 分钟内写完了,所以请休息一下:)