替换字符串之间的白色 space

Replace white space between strings

我有一个类似

的文件
>TCONS_00000066 +1
PPAAARTDLSPPQHVLHVYKRYGPPRQRRRPCPQTWWWQLPHRAAATHPRGEGPRASNPTRQQHFILVYNFSSFLSSWLSLSLLSSPFCYLYICDCHGNTEDEGPLMY*LVSSSLGAFVCKDFHLIDLLDLLFWIEAGYLHAVLHTILQSGRSDR*SRPKYRLTELSVCISVRTSSVINSKC*HN
>TCONS_00000066 +2
RRLLRAPTCHHPSTSSTYTSATVHRGSVDVLVRKHGGGSFLIEQQQLILEGKGPELLILHGNNTLYLCIISLRF*VHGYLCLSYLLPFAISIFVIAMEIQKTRGR*CIDL*VLVWGLSFARIFI*LIFLICYFGSKLATFMPCCIPYFSLVGQTDDRDRSID*PNFRFVYL*GQVLSSIQNVNII
>TCONS_00000066 +3
AGCCAHRLVTTPARPPRIQALRSTAAASTSLSANMVVAASSSSSSNSSSRGRAQSF*SYTATTLYTCV*FLFVSEFMAIFVSLIFSLLLSLYL*LPWKYRRRGAADVLTCEF*FGGFRLQGFSFD*SS*FVILDRSWLPSCRVAYHTSVWSVRPMIETEVSINRTFGLYICEDKFCHQFKMLT*
>TCONS_00000066 -1
YYVNILN**QNLSSQIYKPKVRLIDTSVSIIGLTDQTEVWYATRHEGSQLRSKITNQEDQSNENPCKRKPPN*NSQVNTSAAPRLLYFHGNHKYRDSKREKIRETKIAMNSETKRNYTQV*SVVAV*D*KLWALPLEDELLLLDEEAATTMFADKDVDAAAVDRSACIRGGRAGVVTSRCAQQPA
>TCONS_00000066 -2
IMLTF*IDDRTCPHRYTNRKFG*SILRSRSSV*PTRLKYGMQHGMKVASFDPK*QIKKINQMKILANESPQTRTHKSIHQRPLVFCISMAITNIEIAKGRR*ERQR*P*TQKRREIIHKYKVLLPCRIRSSGPFPSRMSCCCSMRKLPPPCLRTRTSTLPRWTVALVYVEDVLGW*QVGARSSRR
>TCONS_00000066 -3
LC*HFELMTELVLTDIQTESSVNRYFGLDHRSDRPD*SMVCNTA*R*PASIQNNKSRRSIK*KSLQTKAPKLELTSQYISGPSSSVFPWQSQI*R*QKGEDKRDKDSHELRNEEKLYTSIKCCCRVGLEALGPSPRG*VAAAR*GSCHHHVCGQGRRRCRGGP*RLYTWRTCWGGDKSVRAAAG
>TCONS_00000130 +1
LPARPRLQGALQRHRGGKPINQSINQWW*LGQLKTKKERSN*SSC*IVKWYAGEGGDSGSGGGGRGDGGGDGEPARRHHARRRPPRQELPLQVDEPVRANEEGWVQGSWHQAARHGTGRFLQRRAHPNRDHQFARTTA*NPLPNVHPSAGRAMEKKIKGKEEKMKSPCITN*FVMMQAAVRVRSSLIGSIR*ICFTKGATDRLSWLAVWVHIHTTQTQILTI*PFAKNIFTNEQLPKLISNLTLLLNAKSCGAEFRHLSAK*YGAECTLAR*LSLPSAVARHSAPADVALRCLSSAPHDLALSKKVRFEISFGSGSFVKLVFTKG*IVKICATQTHSQEDMNIK*SREGHGFSPGFVPFGCTCTEMIYVVGLTDTKEHM***MIFVLLCQSFTLVFLTCFLSSTVVLRIQ*PQLMRLKWILAN*AYSLIFWLMVIL
>TCONS_00000130 +2
FQLALAFRELCNGIAEVNQSTNQSINGGSWVNSKQRKKEAINHLVEL*NGMQAKVEIVVREGEVGETVVATVNQLAATTLVVGLHDKSFLYRSTNPYERMRRVGCRVLGIRQHATARDGSFNAELTQIETINLHVPPPKIPFPMFTLPLGVLWRKRSKAKKRK*SHHASQINL**CRLQCELGAH*LDQSDEFVLPKEQLTD*AG*LSGYIYTRHKHKF*QFNPLQKIFLQMNSYQNLFQI*PFCLTPNRVALNLDTSAPNSMALNVRWHADLVSHPPWHGIQRQLTWR*GV*VPRHMI*R*AKRSDLK*VLAAVHL*N*FLQRVKLSKFVRHKHTHKKT*TSSEAGRGTVSHLDLCHLVVLVQR*SMLLD*QTPRNTCSSK*FLFYFVKVLHLYS*PVSCLAQ*C*EFSNLS**D*NGYWPIKLIASSFGLWLYL
>TCONS_00000130 +3
SSSPSPSGSSATASRR*TNQPINQSMVVVGSTQNKERKKQLIILLNCEMVCRRRWR*WFGRGRSGRRWWRR*TSSPPPRSSSASTTRASSTGRRTRTSE*GGLGAGFLASGSTPRHGTVPSTPSSPKSRPSICTYHRLKSPSQCSPFRWACYGEKDQRQRRENEVTMHHKLICDDAGCSAS*ELTDWINPMNLFYQRSN*QIELASCLGTYTHDTNTNFDNLTLCKKYFYK*TATKTYFKSDPFA*RQIVWR*I*TPQRQIVWR*MYVGTLT*SPIRRGTAFSAS*RGAEVSKFRAT*FSVEQKGQI*NKFWQRFICKISFYKGLNCQNLCDTNTLTRRHEHQVKQGGARFLTWICAIWLYLYRDDLCCWIDRHQGTHVVVNDFCFTLSKFYTCIPDLFLV*HSSVKNSVTSVDEIKMDIGQLSL*PHLLAYGYTY
>TCONS_00000130 -1
ISITISQKMRL*A*LANIHFNLIN*GY*ILNTTVLDKKQVRNTSVKL*QSKTKIIYYYMCSLVSVNPTT*IISVQVQPNGTNPGEKPCPSLLHLMFMSSCECVCVAQILTI*PFVKTNFTNEPLPKLISNLTFLLNAKSCGAELRHLSATSAGAECRATADGRLSQRANVHSAPYYLALRCLNSAPHDLALSKRVRFEISFGSCSFVKIFFAKG*IVKICVCVVCICTQTASQLNLSVAPLVKQIHRIDPISELLTRTAACIITN*FVMHGDFIFSSLPLIFFSIARPAEG*TLGRGF*AVVRAN*WSRFG*ARR*RNRPVPWRAA*CQEPCTQPSSFARTGSSTCRGSSCRGGRRRAWWRRAGSPSPPPSPRPPPPEPLSPPSPAYHFTIQQDD*LLLSFFVLS*PNYHH*LIDWLIGLPPRCRCRAP*RRGRAG
>TCONS_00000130 -2

我想删除 id 行中字符串之间的 space。

新文件应该是这样的

>TCONS_00000066_+1
PPAAARTDLSPPQHVLHVYKRYGPPRQRRRPCPQTWWWQLPHRAAATHPRGEGPRASNPTRQQHFILVYNFSSFLSSWLSLSLLSSPFCYLYICDCHGNTEDEGPLMY*LVSSSLGAFVCKDFHLIDLLDLLFWIEAGYLHAVLHTILQSGRSDR*SRPKYRLTELSVCISVRTSSVINSKC*HN
>TCONS_00000066_+2
RRLLRAPTCHHPSTSSTYTSATVHRGSVDVLVRKHGGGSFLIEQQQLILEGKGPELLILHGNNTLYLCIISLRF*VHGYLCLSYLLPFAISIFVIAMEIQKTRGR*CIDL*VLVWGLSFARIFI*LIFLICYFGSKLATFMPCCIPYFSLVGQTDDRDRSID*PNFRFVYL*GQVLSSIQNVNII
>TCONS_00000066_+3
AGCCAHRLVTTPARPPRIQALRSTAAASTSLSANMVVAASSSSSSNSSSRGRAQSF*SYTATTLYTCV*FLFVSEFMAIFVSLIFSLLLSLYL*LPWKYRRRGAADVLTCEF*FGGFRLQGFSFD*SS*FVILDRSWLPSCRVAYHTSVWSVRPMIETEVSINRTFGLYICEDKFCHQFKMLT*
>TCONS_00000066_-1
YYVNILN**QNLSSQIYKPKVRLIDTSVSIIGLTDQTEVWYATRHEGSQLRSKITNQEDQSNENPCKRKPPN*NSQVNTSAAPRLLYFHGNHKYRDSKREKIRETKIAMNSETKRNYTQV*SVVAV*D*KLWALPLEDELLLLDEEAATTMFADKDVDAAAVDRSACIRGGRAGVVTSRCAQQPA
>TCONS_00000066_-2
IMLTF*IDDRTCPHRYTNRKFG*SILRSRSSV*PTRLKYGMQHGMKVASFDPK*QIKKINQMKILANESPQTRTHKSIHQRPLVFCISMAITNIEIAKGRR*ERQR*P*TQKRREIIHKYKVLLPCRIRSSGPFPSRMSCCCSMRKLPPPCLRTRTSTLPRWTVALVYVEDVLGW*QVGARSSRR
>TCONS_00000066_-3
LC*HFELMTELVLTDIQTESSVNRYFGLDHRSDRPD*SMVCNTA*R*PASIQNNKSRRSIK*KSLQTKAPKLELTSQYISGPSSSVFPWQSQI*R*QKGEDKRDKDSHELRNEEKLYTSIKCCCRVGLEALGPSPRG*VAAAR*GSCHHHVCGQGRRRCRGGP*RLYTWRTCWGGDKSVRAAAG
>TCONS_00000130_+1
LPARPRLQGALQRHRGGKPINQSINQWW*LGQLKTKKERSN*SSC*IVKWYAGEGGDSGSGGGGRGDGGGDGEPARRHHARRRPPRQELPLQVDEPVRANEEGWVQGSWHQAARHGTGRFLQRRAHPNRDHQFARTTA*NPLPNVHPSAGRAMEKKIKGKEEKMKSPCITN*FVMMQAAVRVRSSLIGSIR*ICFTKGATDRLSWLAVWVHIHTTQTQILTI*PFAKNIFTNEQLPKLISNLTLLLNAKSCGAEFRHLSAK*YGAECTLAR*LSLPSAVARHSAPADVALRCLSSAPHDLALSKKVRFEISFGSGSFVKLVFTKG*IVKICATQTHSQEDMNIK*SREGHGFSPGFVPFGCTCTEMIYVVGLTDTKEHM***MIFVLLCQSFTLVFLTCFLSSTVVLRIQ*PQLMRLKWILAN*AYSLIFWLMVIL
>TCONS_00000130_+2
FQLALAFRELCNGIAEVNQSTNQSINGGSWVNSKQRKKEAINHLVEL*NGMQAKVEIVVREGEVGETVVATVNQLAATTLVVGLHDKSFLYRSTNPYERMRRVGCRVLGIRQHATARDGSFNAELTQIETINLHVPPPKIPFPMFTLPLGVLWRKRSKAKKRK*SHHASQINL**CRLQCELGAH*LDQSDEFVLPKEQLTD*AG*LSGYIYTRHKHKF*QFNPLQKIFLQMNSYQNLFQI*PFCLTPNRVALNLDTSAPNSMALNVRWHADLVSHPPWHGIQRQLTWR*GV*VPRHMI*R*AKRSDLK*VLAAVHL*N*FLQRVKLSKFVRHKHTHKKT*TSSEAGRGTVSHLDLCHLVVLVQR*SMLLD*QTPRNTCSSK*FLFYFVKVLHLYS*PVSCLAQ*C*EFSNLS**D*NGYWPIKLIASSFGLWLYL
>TCONS_00000130_+3
SSSPSPSGSSATASRR*TNQPINQSMVVVGSTQNKERKKQLIILLNCEMVCRRRWR*WFGRGRSGRRWWRR*TSSPPPRSSSASTTRASSTGRRTRTSE*GGLGAGFLASGSTPRHGTVPSTPSSPKSRPSICTYHRLKSPSQCSPFRWACYGEKDQRQRRENEVTMHHKLICDDAGCSAS*ELTDWINPMNLFYQRSN*QIELASCLGTYTHDTNTNFDNLTLCKKYFYK*TATKTYFKSDPFA*RQIVWR*I*TPQRQIVWR*MYVGTLT*SPIRRGTAFSAS*RGAEVSKFRAT*FSVEQKGQI*NKFWQRFICKISFYKGLNCQNLCDTNTLTRRHEHQVKQGGARFLTWICAIWLYLYRDDLCCWIDRHQGTHVVVNDFCFTLSKFYTCIPDLFLV*HSSVKNSVTSVDEIKMDIGQLSL*PHLLAYGYTY
>TCONS_00000130_-1
ISITISQKMRL*A*LANIHFNLIN*GY*ILNTTVLDKKQVRNTSVKL*QSKTKIIYYYMCSLVSVNPTT*IISVQVQPNGTNPGEKPCPSLLHLMFMSSCECVCVAQILTI*PFVKTNFTNEPLPKLISNLTFLLNAKSCGAELRHLSATSAGAECRATADGRLSQRANVHSAPYYLALRCLNSAPHDLALSKRVRFEISFGSCSFVKIFFAKG*IVKICVCVVCICTQTASQLNLSVAPLVKQIHRIDPISELLTRTAACIITN*FVMHGDFIFSSLPLIFFSIARPAEG*TLGRGF*AVVRAN*WSRFG*ARR*RNRPVPWRAA*CQEPCTQPSSFARTGSSTCRGSSCRGGRRRAWWRRAGSPSPPPSPRPPPPEPLSPPSPAYHFTIQQDD*LLLSFFVLS*PNYHH*LIDWLIGLPPRCRCRAP*RRGRAG
>TCONS_00000130_-2

我使用了 sedtr 但没有得到所需的输出。

您似乎在尝试用 _ 符号替换空格。如果是,那么你可以考虑这个,

sed 's/[[:blank:]]\+/_/g' file

sed 's/\(TCONS_[0-9]\{8\}\)[[:blank:]]\+/_/g' file

您需要捕捉想要保留的角色。所以在这里,您要保留的字符是 TCONS_+ 8digits。所以将匹配的模式放在捕获组中,\(...\)。并将以下一个或多个空格与此 [[:blank:]]\+ 模式匹配。您必须需要对 + 进行转义,以便它会重复前一个标记一次或多次,否则它会匹配文字 + 符号,因为基本 sed 使用 BRE (Baisc正则表达式)

使用tr,它的确切目的是用其他字符替换字符。

tr ' ' '_' < file

作为额外的好处,您可以使用 s 选项将多次出现的事件压缩为一个,如下所示:

tr -s ' ' '_' < file

例如有如下效果:

$ cat a
hello     world           this
is     a      sample       file
$ tr -s ' ' '_' < a
hello_world_this
is_a_sample_file

当然要将原文件中的修改保存下来,就得先输出成文件,再移回原文件。