使用cmd更改文本文件编码时出现问题
Problem changing text file encoding using cmd
我最近有一个任务要使用 cmd 将 .txt 文件的编码更改为 Windows-1251、OEM866 和 UTF-8。我试过使用:
- chcp 866
- cmd /u /c /d 输入 1.txt > 866.txt
但是文本文件有 UTF-16 编码,尽管看起来像 OEM866 文本。
如果您希望坚持使用 cmd,那么您可能需要旧的 2004 iconv 代码转换器工具,因此这里是 download.cmd 以获得 iconv.exe 转换和支持文件。但是,请阅读 Force encode from US-ASCII to UTF-8 (iconv) 以获取任何相关建议,因为它很容易使用错误的输入进行转码。
@echo off & Title Get-iConv
Rem Download libiconv-1.9.1 and support files on Windows 10 optionally include gettext-tools
set "iconv-dir=c:\text-iconv"
if not exist "%iconv-dir%" md "%iconv-dir%"
cd /d "%iconv-dir%"
if not exist gt-runtime.woe32.zip curl -o gt-runtime.woe32.zip http://ftp.gnu.org/gnu/gettext/gettext-runtime-0.13.1.bin.woe32.zip
tar -xf gt-runtime.woe32.zip bin
tar -xf gt-runtime.woe32.zip share/doc
Rem if not exist gt-tools.woe32.zip curl -o gt-tools.woe32.zip http://ftp.gnu.org/gnu/gettext/gettext-tools-0.13.1.bin.woe32.zip
Rem tar -xf gt-tools.woe32.zip bin
Rem tar -xf gt-tools.woe32.zip share/doc
if not exist libiconv.woe32.zip curl -o libiconv.woe32.zip http://ftp.gnu.org/gnu/libiconv/libiconv-1.9.1.bin.woe32.zip
tar -xf libiconv.woe32.zip bin
tar -xf libiconv.woe32.zip share/doc
cd bin
start "" cmd /k "%iconv-dir%\bin\iconv.exe" -h
start "" "%iconv-dir%\share\doc\libiconv\iconv.1.html"
start ""
我会说任务(将给定文件的编码从一种编码转换为另一种编码,就像 iconv
工具一样)仅使用 cmd
即可解决:首先,创建两个辅助二进制文件 bomUtf16le.bin
和bomUtf8.bin
如下:
REM do dot run as a batch file; copy&paste the code into an open cmd window
:: create a testing folder and change the current directory
2>NUL md .\SO595742
pushd .\SO595742
:: create file bomUtf16le.bin (BOM, encoding utf16LE)
>NUL chcp 1252
<nul set /p x=ÿþ>bomUtf16le.bin
:: create file bomUtf8.bin (BOM, encoding utf8)
>NUL chcp 1252
<nul set /p x=>bomUtf8.bin
:: create file a1200.txt (a Cyrillic text, encoding utf16LEbom)
>NUL copy /Y /B bomUtf16le.bin a1200.txt
cmd /U /D /C "(echo русский текст&echo кирилловский шрифт)>>a1200.txt"
popd
重要:从批处理文件中在上面的代码片段上加点运行;将代码复制并粘贴到打开的 cmd
window 中!
代码创建初始测试文件a1200.txt
(编码utf16LEbom
)。我们可以从 any 支持编码 1251
或 866
或 65001
(==Utf8bom
) 的文件开始,因为下面的转换是设计为循环工作(使用fc
命令通过二进制比较证明,通过打开notepad++
中的所有文件手动确认 ).以下代码片段假定初始测试文件编码为 utf16LEbom
.
然后运行以下(运行作为批处理文件,或将代码复制粘贴到打开的cmd
window):
@ECHO OFF
SETLOCAL EnableExtensions
:: run as a batch file, or copy&paste the code into an open cmd window
2>NUL md .\SO595742
pushd .\SO595742
:: convert file a1200.txt to cp1251
>NUL chcp 1251
type a1200.txt>x1251.txt
:: convert file a1200.txt to cp866
>NUL chcp 866
type a1200.txt>x866.txt
:: convert file a1200.txt to utf-8 BOM
>NUL copy /Y /B bomUtf8.bin x65001bom.txt
>NUL chcp 65001
type a1200.txt>>x65001Bom.txt
:: convert file x866.txt to file x1200.txt (encoding utf16LEbom)
>NUL copy /Y /B bomUtf16le.bin x1200.txt
>NUL chcp 866
cmd /U /D /C "type x866.txt>>x1200.txt"
:: Perform a binary comparison (FC: no differences encountered)
fc /B x1200.txt a1200.txt
:: convert file x1251.txt to file y1200.txt (encoding utf16LEbom)
:: analogous to: x866.txt to file x1200.txt
>NUL copy /Y /B bomUtf16le.bin y1200.txt
>NUL chcp 1251
cmd /U /D /C "type x1251.txt>>y1200.txt"
:: Perform a binary comparison (FC: no differences encountered)
fc /B y1200.txt a1200.txt
:: convert file x65001bom.txt to file z1200.txt (encoding utf16LEbom)
>NUL chcp 65001
cmd /U /D /C "type x65001bom.txt>z1200.txt"
:: Perform a binary comparison (FC: no differences encountered)
fc /B z1200.txt a1200.txt
:: convert file a1200.txt to x65001noBom.txt (utf-8 no BOM, merely for completeness)
>NUL chcp 65001
type a1200.txt>x65001noBom.txt
dir *.txt | findstr /I "\.txt$"
popd
goto :eof
结果:.\SO595742.bat
Comparing files x1200.txt and A1200.TXT
FC: no differences encountered
Comparing files y1200.txt and A1200.TXT
FC: no differences encountered
Comparing files z1200.txt and A1200.TXT
FC: no differences encountered
17/10/2021 19:24 72 a1200.txt
17/10/2021 21:49 72 x1200.txt
17/10/2021 21:49 35 x1251.txt
17/10/2021 21:49 67 x65001Bom.txt
17/10/2021 21:49 64 x65001noBom.txt
17/10/2021 21:49 35 x866.txt
17/10/2021 21:49 72 y1200.txt
17/10/2021 21:49 72 z1200.txt
总结(不完整):文件转换(⇆可逆)
直接:
utf-16-le-bom
⇆cp866
utf-16-le-bom
⇆cp1251
utf-16-le-bom
⇆utf-8-bom
utf-16-le-bom
→utf-8-noBom
可能(通过辅助文件):
cp866
⇆utf-16-le-bom
⇆cp1251
cp866
⇆utf-16-le-bom
⇆utf-8-bom
utf-8-bom
⇆utf-16-le-bom
⇆cp1251
可能utf-8-noBom
→utf-8-bom
如下:
copy /B bomUtf8.bin + fileutf-8-noBom.txt fileutf-8-bom.txt
在 Windows10 中使用以下管理语言设置进行了测试;未使用未选中的 Beta 复选框进行测试:
我最近有一个任务要使用 cmd 将 .txt 文件的编码更改为 Windows-1251、OEM866 和 UTF-8。我试过使用:
- chcp 866
- cmd /u /c /d 输入 1.txt > 866.txt 但是文本文件有 UTF-16 编码,尽管看起来像 OEM866 文本。
如果您希望坚持使用 cmd,那么您可能需要旧的 2004 iconv 代码转换器工具,因此这里是 download.cmd 以获得 iconv.exe 转换和支持文件。但是,请阅读 Force encode from US-ASCII to UTF-8 (iconv) 以获取任何相关建议,因为它很容易使用错误的输入进行转码。
@echo off & Title Get-iConv
Rem Download libiconv-1.9.1 and support files on Windows 10 optionally include gettext-tools
set "iconv-dir=c:\text-iconv"
if not exist "%iconv-dir%" md "%iconv-dir%"
cd /d "%iconv-dir%"
if not exist gt-runtime.woe32.zip curl -o gt-runtime.woe32.zip http://ftp.gnu.org/gnu/gettext/gettext-runtime-0.13.1.bin.woe32.zip
tar -xf gt-runtime.woe32.zip bin
tar -xf gt-runtime.woe32.zip share/doc
Rem if not exist gt-tools.woe32.zip curl -o gt-tools.woe32.zip http://ftp.gnu.org/gnu/gettext/gettext-tools-0.13.1.bin.woe32.zip
Rem tar -xf gt-tools.woe32.zip bin
Rem tar -xf gt-tools.woe32.zip share/doc
if not exist libiconv.woe32.zip curl -o libiconv.woe32.zip http://ftp.gnu.org/gnu/libiconv/libiconv-1.9.1.bin.woe32.zip
tar -xf libiconv.woe32.zip bin
tar -xf libiconv.woe32.zip share/doc
cd bin
start "" cmd /k "%iconv-dir%\bin\iconv.exe" -h
start "" "%iconv-dir%\share\doc\libiconv\iconv.1.html"
start ""
我会说任务(将给定文件的编码从一种编码转换为另一种编码,就像 iconv
工具一样)仅使用 cmd
即可解决:首先,创建两个辅助二进制文件 bomUtf16le.bin
和bomUtf8.bin
如下:
REM do dot run as a batch file; copy&paste the code into an open cmd window
:: create a testing folder and change the current directory
2>NUL md .\SO595742
pushd .\SO595742
:: create file bomUtf16le.bin (BOM, encoding utf16LE)
>NUL chcp 1252
<nul set /p x=ÿþ>bomUtf16le.bin
:: create file bomUtf8.bin (BOM, encoding utf8)
>NUL chcp 1252
<nul set /p x=>bomUtf8.bin
:: create file a1200.txt (a Cyrillic text, encoding utf16LEbom)
>NUL copy /Y /B bomUtf16le.bin a1200.txt
cmd /U /D /C "(echo русский текст&echo кирилловский шрифт)>>a1200.txt"
popd
重要:从批处理文件中在上面的代码片段上加点运行;将代码复制并粘贴到打开的 cmd
window 中!
代码创建初始测试文件a1200.txt
(编码utf16LEbom
)。我们可以从 any 支持编码 1251
或 866
或 65001
(==Utf8bom
) 的文件开始,因为下面的转换是设计为循环工作(使用fc
命令通过二进制比较证明,通过打开notepad++
中的所有文件手动确认 ).以下代码片段假定初始测试文件编码为 utf16LEbom
.
然后运行以下(运行作为批处理文件,或将代码复制粘贴到打开的cmd
window):
@ECHO OFF
SETLOCAL EnableExtensions
:: run as a batch file, or copy&paste the code into an open cmd window
2>NUL md .\SO595742
pushd .\SO595742
:: convert file a1200.txt to cp1251
>NUL chcp 1251
type a1200.txt>x1251.txt
:: convert file a1200.txt to cp866
>NUL chcp 866
type a1200.txt>x866.txt
:: convert file a1200.txt to utf-8 BOM
>NUL copy /Y /B bomUtf8.bin x65001bom.txt
>NUL chcp 65001
type a1200.txt>>x65001Bom.txt
:: convert file x866.txt to file x1200.txt (encoding utf16LEbom)
>NUL copy /Y /B bomUtf16le.bin x1200.txt
>NUL chcp 866
cmd /U /D /C "type x866.txt>>x1200.txt"
:: Perform a binary comparison (FC: no differences encountered)
fc /B x1200.txt a1200.txt
:: convert file x1251.txt to file y1200.txt (encoding utf16LEbom)
:: analogous to: x866.txt to file x1200.txt
>NUL copy /Y /B bomUtf16le.bin y1200.txt
>NUL chcp 1251
cmd /U /D /C "type x1251.txt>>y1200.txt"
:: Perform a binary comparison (FC: no differences encountered)
fc /B y1200.txt a1200.txt
:: convert file x65001bom.txt to file z1200.txt (encoding utf16LEbom)
>NUL chcp 65001
cmd /U /D /C "type x65001bom.txt>z1200.txt"
:: Perform a binary comparison (FC: no differences encountered)
fc /B z1200.txt a1200.txt
:: convert file a1200.txt to x65001noBom.txt (utf-8 no BOM, merely for completeness)
>NUL chcp 65001
type a1200.txt>x65001noBom.txt
dir *.txt | findstr /I "\.txt$"
popd
goto :eof
结果:.\SO595742.bat
Comparing files x1200.txt and A1200.TXT
FC: no differences encountered
Comparing files y1200.txt and A1200.TXT
FC: no differences encountered
Comparing files z1200.txt and A1200.TXT
FC: no differences encountered
17/10/2021 19:24 72 a1200.txt
17/10/2021 21:49 72 x1200.txt
17/10/2021 21:49 35 x1251.txt
17/10/2021 21:49 67 x65001Bom.txt
17/10/2021 21:49 64 x65001noBom.txt
17/10/2021 21:49 35 x866.txt
17/10/2021 21:49 72 y1200.txt
17/10/2021 21:49 72 z1200.txt
总结(不完整):文件转换(⇆可逆)
直接:
utf-16-le-bom
⇆cp866
utf-16-le-bom
⇆cp1251
utf-16-le-bom
⇆utf-8-bom
utf-16-le-bom
→utf-8-noBom
可能(通过辅助文件):
cp866
⇆utf-16-le-bom
⇆cp1251
cp866
⇆utf-16-le-bom
⇆utf-8-bom
utf-8-bom
⇆utf-16-le-bom
⇆cp1251
可能utf-8-noBom
→utf-8-bom
如下:
copy /B bomUtf8.bin + fileutf-8-noBom.txt fileutf-8-bom.txt
在 Windows10 中使用以下管理语言设置进行了测试;未使用未选中的 Beta 复选框进行测试: