需要从 1 行很长的文本文件中替换 13 个空格
Need to replace 13 blank spaces from 1 very long line of text file
我有一个文件 (1.8 Mb),其中有 1 行(很长的)文本。该行上的值通常由 13 个空格分隔。我想做的是用管道 | 替换这 13 个空格分隔符,以便我可以使用 SSIS 处理此文本文件。
到目前为止,我没有成功使用批处理文件以编程方式处理此文件。
我尝试使用从另一个 SO post 获得的以下代码。
@echo off
REM create empty file:
break>R1.txt
setlocal enabledelayedexpansion
REM prevent empty lines by adding line numbers (find /v /n "")
REM parse the file, taking the second token (*, %%b) with delimiters
REM ] (to eliminate line numbers) and space (to eliminate leading spaces)
for /f "tokens=1,* delims=] " %%a in ('find /v /n "" ^<PXZP_SND_XZ01_GFT10553.dat') do (
call :sub1 "%%b"
REM write the string without quotes:
REM removing the qoutes from the string would make the special chars poisonous again
>>PXZP_SND_XZ01_GFT10553.dat echo(!s:"=!
)
REM Show the written file:
type PXZP_SND_XZ01_GFT10553.dat
goto :eof
:sub1
set S=%*
REM do 13 times (adapt to your Needs):
for /l %%i in (1,1,13) do (
REM replace "space qoute" with "quote" (= removing the last space
set S=!S: "=|!
)
goto :eof
有人可以帮我吗?我的文本文件示例:
96859471/971 AAAA HAWAII 96860471/971 BBBB HAWAII 96861471/971 CCCC HAWAII 96863471/971 DDDD HAWAII
使用合适的工具。
Set Inp = wscript.Stdin
Set Outp = wscript.Stdout
Outp.Write Replace(Inp.ReadAll, " ", "|")
使用
cscript //nologo "C:\Replace13Spaces.vbs" < "c:\folder\inputfile.txt" > "C:\Folder\Outputfile.txt"
使用正则表达式用横线替换 2 个或更多空格。
Set Inp = wscript.Stdin
Set Outp = wscript.Stdout
Set regEx = New RegExp
regEx.Pattern = "\s{2,}"
regEx.IgnoreCase = True
regEx.Global = True
Outp.Write regEx.Replace(Inp.ReadAll, "|")
还有两种其他方法可以处理此问题。
像第一种方式是replace
多次从最长到最短的预定义空格数。 IE 13、10、8 或 5 个空格。
Split
2 个空格上的刺。 Filter
排除空白数组元素的数组。然后Join
以|
为分隔符的数组
for /F
loop cannot handle lines longer than about 8190 characters. However, there is a way to read files with longer lines: using set /P
in a loop, together with input redirection <
; set /P
最多读取 1023 个字符,除非遇到换行符或文件末尾;对同一个打开的(输入重定向的)文件句柄多次执行它允许读取 1023 个字符部分的非常长的行,因为 set /P
不会重置文件指针。
另一个挑战是 return(回显)非常长的行,这在 echo
command again because of the line limitation of about 8190 characters (which applies to command lines and variable contents). Also here block-wise processing helps: firstly, get an end-of-file character (EOF, ASCII 0x1A); then take a text/string portion, append an EOF and write the result to a temporary file using echo
(which appends a line-break), together with output redirection >
; next copy the file onto itself using copy
, but read it in ASCII text mode to discard the EOF and everything after (hence the line-break previously appended by echo
) and write it in binary mode to get an exact copy of the resulting data; lastly, type out the file content using type
.
中是不可能的
以下脚本使用了这些技术(请参阅代码中的所有解释性 rem
注释):
@echo off
setlocal EnableExtensions DisableDelayedexpansion
rem // Define constants here:
set "_INPUT=.\PXZP_SND_XZ01_GFT10553.dat" & rem // (this is the input file)
set "_OUTPUT=.\R1.txt" & rem // (set to `con` to display the result on the console)
set "_TEMPF=%TEMP%\%~n0_%RANDOM%.tmp" & rem // (specifies a temporary file)
set "_SEARCH= " & rem // (this is the string to be found)
set "_REPLAC=|" & rem // (this is the replacement string)
set "_LTRIM=#" & rem // (set to something to left-trim sub-strings)
(set _LF=^
%= blank line =%
) & rem // (this block stores a new-line character in a variable)
rem // This stores an end-of-file character in a variable:
for /F %%E in ('forfiles /P "%~dp0." /M "%~nx0" /C "cmd /C echo 0x1A"') do set "_EOF=%%E"
rem /* The input file is going to be processed in a sub-routine,
rem which accesses the file content via input redirection `<`: */
< "%_INPUT%" > "%_OUTPUT%" call :PROCESS
endlocal
exit /B
:PROCESS
rem // Reset variables that store a partial string to be processed and a separator:
set "PART=" & set "SEP="
setlocal EnableDelayedExpansion
:READ
rem /* At this point 1023 characters are read from the input file at most, until
rem a line-break or the end of the file is encountered:*/
set "NEW=" & set /P NEW=""
rem // The read characters are appended to a string buffer that will be processed:
set "PART=!PART!!NEW!"
rem /* Skip processing when the string buffer is empty, which is the case when the end
rem of the file has already been reached: */
:LOOP
if defined PART (
rem /* Make the search string accessible as a `for` meta-variable reference in
rem to not have to use normal (immediate) `%`-expansion, which could cause
rem trouble with some special characters under some circumstances: */
for /F delims^=^ eol^= %%K in ("!_SEARCH!") do (
rem /* Try to split the string buffer at the first search string and store the
rem portion at the right, using sub-string substitution: */
set "RIGHT=!PART:*%%K=!"
rem /* Check whether the split was successful, hence whether a search string
rem even occurred in the string buffer; if not, jump back and read more
rem characters; otherwise (when the end of the file was reached) clear the
rem right portion and continue processing: */
if "!RIGHT!"=="!PART!" if not defined NEW (set "RIGHT=") else goto :READ
rem /* Clear the variable that will receive the portion left to the first
rem occurrence of the search string in the string buffer; then replace each
rem occurrence in the string buffer by a new-line character: */
set "LEFT=" & set ^"PART=!PART:%%K=^%_LF%%_LF%!^"
rem /* Iterate over all lines of the altered string buffer, which is now a
rem multi-line string, then get the first line, which constitutes the
rem portion at the left of the first search string; the (first) line is
rem preceded by an `_` just for it not to appear blank, because `for /F`
rem skips over empty lines; this character is removed later: */
for /F delims^=^ eol^= %%L in (^"_!PART!^") do (
rem // Execute the loop body only for the first iteration:
if not defined LEFT (
rem /* Store the (augmented) left portion with delayed expansion
rem disabled in order not to get trouble with `!` in the string: */
setlocal DisableDelayedExpansion & set "LEFT=%%L"
rem // Enable delayed expansion to be able to safely echo the string:
setlocal EnableDelayedExpansion
rem /* Write to a temporary file the output string, which consists of
rem a replacement string (except for the very first time), the left
rem portion with the preceding `_` removed and an end-of-file
rem character; a line-break is automatically appended by `echo`: */
> "!_TEMPF!" echo(!SEP!!LEFT:~1!%_EOF%
rem /* Copy the temporary file onto itself, but remove the end-of-file
rem character and everything after, then type the file content;
rem this is a safe way of echoing a string without a line-break: */
> nul copy /Y /A "!_TEMPF!" + nul "!_TEMPF!" /B & type "!_TEMPF!"
rem /* Restore the environment present at the beginning of the loop
rem body, then ensure the left portion not to appear empty: */
endlocal & endlocal & set "LEFT=_"
)
)
rem // If specified, left-trim the right portion, so remove leading spaces:
if defined _LTRIM (
for /F "tokens=* eol= delims= " %%T in ("!RIGHT!_") do (
for /F delims^=^ eol^= %%S in (^""!NEW!"^") do (
endlocal & set "NEW=%%~S" & set "RIGHT=%%T"
)
setlocal EnableDelayedExpansion & set "RIGHT=!RIGHT:~,-1!"
)
)
rem // Set the replacement string now to skip it only for the first output:
set "SEP=!_REPLAC!"
rem /* Move the right portion into the string buffer; if there is still some
rem amount of text left, jump back to find more occurrences of the search
rem string; if not, jump back and read more characters, unless the end of
rem the file has already been reached: */
set "PART=!RIGHT!" & if defined PART (
if defined NEW if "!PART:~1024!"=="" goto :READ
goto :LOOP
) else if defined NEW goto :READ
)
)
endlocal
rem // Clean up the temporary file:
del "%_TEMPF%"
exit /B
存在以下限制:
- 两个连续搜索字符串之间的字符串部分(= 5 × SPACE 上述方法中)必须短于大约 8190 个字符;
- 搜索字符串不能为空,不能以
!
、*
、~
开头,也不能包含 =
;
- 替换字符串不能包含
!
;
我有一个文件 (1.8 Mb),其中有 1 行(很长的)文本。该行上的值通常由 13 个空格分隔。我想做的是用管道 | 替换这 13 个空格分隔符,以便我可以使用 SSIS 处理此文本文件。
到目前为止,我没有成功使用批处理文件以编程方式处理此文件。
我尝试使用从另一个 SO post 获得的以下代码。
@echo off
REM create empty file:
break>R1.txt
setlocal enabledelayedexpansion
REM prevent empty lines by adding line numbers (find /v /n "")
REM parse the file, taking the second token (*, %%b) with delimiters
REM ] (to eliminate line numbers) and space (to eliminate leading spaces)
for /f "tokens=1,* delims=] " %%a in ('find /v /n "" ^<PXZP_SND_XZ01_GFT10553.dat') do (
call :sub1 "%%b"
REM write the string without quotes:
REM removing the qoutes from the string would make the special chars poisonous again
>>PXZP_SND_XZ01_GFT10553.dat echo(!s:"=!
)
REM Show the written file:
type PXZP_SND_XZ01_GFT10553.dat
goto :eof
:sub1
set S=%*
REM do 13 times (adapt to your Needs):
for /l %%i in (1,1,13) do (
REM replace "space qoute" with "quote" (= removing the last space
set S=!S: "=|!
)
goto :eof
有人可以帮我吗?我的文本文件示例:
96859471/971 AAAA HAWAII 96860471/971 BBBB HAWAII 96861471/971 CCCC HAWAII 96863471/971 DDDD HAWAII
使用合适的工具。
Set Inp = wscript.Stdin
Set Outp = wscript.Stdout
Outp.Write Replace(Inp.ReadAll, " ", "|")
使用
cscript //nologo "C:\Replace13Spaces.vbs" < "c:\folder\inputfile.txt" > "C:\Folder\Outputfile.txt"
使用正则表达式用横线替换 2 个或更多空格。
Set Inp = wscript.Stdin
Set Outp = wscript.Stdout
Set regEx = New RegExp
regEx.Pattern = "\s{2,}"
regEx.IgnoreCase = True
regEx.Global = True
Outp.Write regEx.Replace(Inp.ReadAll, "|")
还有两种其他方法可以处理此问题。
像第一种方式是
replace
多次从最长到最短的预定义空格数。 IE 13、10、8 或 5 个空格。Split
2 个空格上的刺。Filter
排除空白数组元素的数组。然后Join
以|
为分隔符的数组
for /F
loop cannot handle lines longer than about 8190 characters. However, there is a way to read files with longer lines: using set /P
in a loop, together with input redirection <
; set /P
最多读取 1023 个字符,除非遇到换行符或文件末尾;对同一个打开的(输入重定向的)文件句柄多次执行它允许读取 1023 个字符部分的非常长的行,因为 set /P
不会重置文件指针。
另一个挑战是 return(回显)非常长的行,这在 echo
command again because of the line limitation of about 8190 characters (which applies to command lines and variable contents). Also here block-wise processing helps: firstly, get an end-of-file character (EOF, ASCII 0x1A); then take a text/string portion, append an EOF and write the result to a temporary file using echo
(which appends a line-break), together with output redirection >
; next copy the file onto itself using copy
, but read it in ASCII text mode to discard the EOF and everything after (hence the line-break previously appended by echo
) and write it in binary mode to get an exact copy of the resulting data; lastly, type out the file content using type
.
以下脚本使用了这些技术(请参阅代码中的所有解释性 rem
注释):
@echo off
setlocal EnableExtensions DisableDelayedexpansion
rem // Define constants here:
set "_INPUT=.\PXZP_SND_XZ01_GFT10553.dat" & rem // (this is the input file)
set "_OUTPUT=.\R1.txt" & rem // (set to `con` to display the result on the console)
set "_TEMPF=%TEMP%\%~n0_%RANDOM%.tmp" & rem // (specifies a temporary file)
set "_SEARCH= " & rem // (this is the string to be found)
set "_REPLAC=|" & rem // (this is the replacement string)
set "_LTRIM=#" & rem // (set to something to left-trim sub-strings)
(set _LF=^
%= blank line =%
) & rem // (this block stores a new-line character in a variable)
rem // This stores an end-of-file character in a variable:
for /F %%E in ('forfiles /P "%~dp0." /M "%~nx0" /C "cmd /C echo 0x1A"') do set "_EOF=%%E"
rem /* The input file is going to be processed in a sub-routine,
rem which accesses the file content via input redirection `<`: */
< "%_INPUT%" > "%_OUTPUT%" call :PROCESS
endlocal
exit /B
:PROCESS
rem // Reset variables that store a partial string to be processed and a separator:
set "PART=" & set "SEP="
setlocal EnableDelayedExpansion
:READ
rem /* At this point 1023 characters are read from the input file at most, until
rem a line-break or the end of the file is encountered:*/
set "NEW=" & set /P NEW=""
rem // The read characters are appended to a string buffer that will be processed:
set "PART=!PART!!NEW!"
rem /* Skip processing when the string buffer is empty, which is the case when the end
rem of the file has already been reached: */
:LOOP
if defined PART (
rem /* Make the search string accessible as a `for` meta-variable reference in
rem to not have to use normal (immediate) `%`-expansion, which could cause
rem trouble with some special characters under some circumstances: */
for /F delims^=^ eol^= %%K in ("!_SEARCH!") do (
rem /* Try to split the string buffer at the first search string and store the
rem portion at the right, using sub-string substitution: */
set "RIGHT=!PART:*%%K=!"
rem /* Check whether the split was successful, hence whether a search string
rem even occurred in the string buffer; if not, jump back and read more
rem characters; otherwise (when the end of the file was reached) clear the
rem right portion and continue processing: */
if "!RIGHT!"=="!PART!" if not defined NEW (set "RIGHT=") else goto :READ
rem /* Clear the variable that will receive the portion left to the first
rem occurrence of the search string in the string buffer; then replace each
rem occurrence in the string buffer by a new-line character: */
set "LEFT=" & set ^"PART=!PART:%%K=^%_LF%%_LF%!^"
rem /* Iterate over all lines of the altered string buffer, which is now a
rem multi-line string, then get the first line, which constitutes the
rem portion at the left of the first search string; the (first) line is
rem preceded by an `_` just for it not to appear blank, because `for /F`
rem skips over empty lines; this character is removed later: */
for /F delims^=^ eol^= %%L in (^"_!PART!^") do (
rem // Execute the loop body only for the first iteration:
if not defined LEFT (
rem /* Store the (augmented) left portion with delayed expansion
rem disabled in order not to get trouble with `!` in the string: */
setlocal DisableDelayedExpansion & set "LEFT=%%L"
rem // Enable delayed expansion to be able to safely echo the string:
setlocal EnableDelayedExpansion
rem /* Write to a temporary file the output string, which consists of
rem a replacement string (except for the very first time), the left
rem portion with the preceding `_` removed and an end-of-file
rem character; a line-break is automatically appended by `echo`: */
> "!_TEMPF!" echo(!SEP!!LEFT:~1!%_EOF%
rem /* Copy the temporary file onto itself, but remove the end-of-file
rem character and everything after, then type the file content;
rem this is a safe way of echoing a string without a line-break: */
> nul copy /Y /A "!_TEMPF!" + nul "!_TEMPF!" /B & type "!_TEMPF!"
rem /* Restore the environment present at the beginning of the loop
rem body, then ensure the left portion not to appear empty: */
endlocal & endlocal & set "LEFT=_"
)
)
rem // If specified, left-trim the right portion, so remove leading spaces:
if defined _LTRIM (
for /F "tokens=* eol= delims= " %%T in ("!RIGHT!_") do (
for /F delims^=^ eol^= %%S in (^""!NEW!"^") do (
endlocal & set "NEW=%%~S" & set "RIGHT=%%T"
)
setlocal EnableDelayedExpansion & set "RIGHT=!RIGHT:~,-1!"
)
)
rem // Set the replacement string now to skip it only for the first output:
set "SEP=!_REPLAC!"
rem /* Move the right portion into the string buffer; if there is still some
rem amount of text left, jump back to find more occurrences of the search
rem string; if not, jump back and read more characters, unless the end of
rem the file has already been reached: */
set "PART=!RIGHT!" & if defined PART (
if defined NEW if "!PART:~1024!"=="" goto :READ
goto :LOOP
) else if defined NEW goto :READ
)
)
endlocal
rem // Clean up the temporary file:
del "%_TEMPF%"
exit /B
存在以下限制:
- 两个连续搜索字符串之间的字符串部分(= 5 × SPACE 上述方法中)必须短于大约 8190 个字符;
- 搜索字符串不能为空,不能以
!
、*
、~
开头,也不能包含=
; - 替换字符串不能包含
!
;