使用cmd更改文本文件编码时出现问题

Problem changing text file encoding using cmd

我最近有一个任务要使用 cmd 将 .txt 文件的编码更改为 Windows-1251、OEM866 和 UTF-8。我试过使用:

  1. chcp 866
  2. cmd /u /c /d 输入 1.txt > 866.txt 但是文本文件有 UTF-16 编码,尽管看起来像 OEM866 文本。

如果您希望坚持使用 cmd,那么您可能需要旧的 2004 iconv 代码转换器工具,因此这里是 download.cmd 以获得 iconv.exe 转换和支持文件。但是,请阅读 Force encode from US-ASCII to UTF-8 (iconv) 以获取任何相关建议,因为它很容易使用错误的输入进行转码。

@echo off & Title Get-iConv

Rem Download libiconv-1.9.1 and support files on Windows 10 optionally include gettext-tools

set "iconv-dir=c:\text-iconv"
if not exist "%iconv-dir%" md "%iconv-dir%"
cd /d "%iconv-dir%"

if not exist gt-runtime.woe32.zip curl -o gt-runtime.woe32.zip http://ftp.gnu.org/gnu/gettext/gettext-runtime-0.13.1.bin.woe32.zip
tar -xf  gt-runtime.woe32.zip bin 
tar -xf  gt-runtime.woe32.zip share/doc

Rem if not exist gt-tools.woe32.zip curl -o gt-tools.woe32.zip http://ftp.gnu.org/gnu/gettext/gettext-tools-0.13.1.bin.woe32.zip
Rem tar -xf  gt-tools.woe32.zip bin
Rem tar -xf  gt-tools.woe32.zip share/doc

if not exist libiconv.woe32.zip curl -o libiconv.woe32.zip http://ftp.gnu.org/gnu/libiconv/libiconv-1.9.1.bin.woe32.zip
tar -xf  libiconv.woe32.zip bin
tar -xf  libiconv.woe32.zip share/doc

cd bin
start "" cmd /k "%iconv-dir%\bin\iconv.exe" -h
start "" "%iconv-dir%\share\doc\libiconv\iconv.1.html"
start "" 

我会说任务(将给定文件的编码从一种编码转换为另一种编码,就像 iconv 工具一样)仅使用 cmd 即可解决:首先,创建两个辅助二进制文件 bomUtf16le.binbomUtf8.bin如下:

REM do dot run as a batch file; copy&paste the code into an open cmd window

:: create a testing folder and change the current directory
2>NUL md .\SO595742
pushd    .\SO595742

:: create file bomUtf16le.bin (BOM, encoding utf16LE)
>NUL chcp 1252
<nul set /p x=ÿþ>bomUtf16le.bin
:: create file bomUtf8.bin    (BOM, encoding utf8)
>NUL chcp 1252
<nul set /p x=>bomUtf8.bin

:: create file a1200.txt (a Cyrillic text, encoding utf16LEbom)
>NUL copy /Y /B bomUtf16le.bin a1200.txt 
cmd /U /D /C "(echo русский текст&echo кирилловский шрифт)>>a1200.txt"

popd

重要:从批处理文件中在上面的代码片段上加点运行;将代码复制并粘贴到打开的 cmd window 中!
代码创建初始测试文件a1200.txt(编码utf16LEbom)。我们可以从 any 支持编码 125186665001(==Utf8bom) 的文件开始,因为下面的转换是设计为循环工作(使用fc命令通过二进制比较证明,通过打开notepad++中的所有文件手动确认 ).以下代码片段假定初始测试文件编码为 utf16LEbom.

然后运行以下(运行作为批处理文件,或将代码复制粘贴到打开的cmdwindow):

@ECHO OFF
SETLOCAL EnableExtensions

:: run as a batch file, or copy&paste the code into an open cmd window

2>NUL md .\SO595742
pushd    .\SO595742

:: convert file a1200.txt to cp1251
>NUL chcp 1251
type a1200.txt>x1251.txt

:: convert file a1200.txt to cp866
>NUL chcp 866
type a1200.txt>x866.txt

:: convert file a1200.txt to utf-8 BOM
>NUL copy /Y /B bomUtf8.bin x65001bom.txt
>NUL chcp 65001
type a1200.txt>>x65001Bom.txt

:: convert file x866.txt to file x1200.txt (encoding utf16LEbom)
>NUL copy /Y /B bomUtf16le.bin x1200.txt
>NUL chcp 866
cmd /U /D /C "type x866.txt>>x1200.txt"

:: Perform a binary comparison (FC: no differences encountered)
fc /B x1200.txt a1200.txt

:: convert file x1251.txt to file y1200.txt (encoding utf16LEbom)
:: analogous to: x866.txt to file x1200.txt
>NUL copy /Y /B bomUtf16le.bin y1200.txt
>NUL chcp 1251
cmd /U /D /C "type x1251.txt>>y1200.txt"

:: Perform a binary comparison (FC: no differences encountered)
fc /B y1200.txt a1200.txt

:: convert file x65001bom.txt to file z1200.txt (encoding utf16LEbom)
>NUL chcp 65001
cmd /U /D /C "type x65001bom.txt>z1200.txt"

:: Perform a binary comparison (FC: no differences encountered)
fc /B z1200.txt a1200.txt

:: convert file a1200.txt to x65001noBom.txt (utf-8 no BOM, merely for completeness)
>NUL chcp 65001
type a1200.txt>x65001noBom.txt

dir *.txt | findstr /I "\.txt$"

popd

goto :eof

结果.\SO595742.bat

Comparing files x1200.txt and A1200.TXT
FC: no differences encountered

Comparing files y1200.txt and A1200.TXT
FC: no differences encountered

Comparing files z1200.txt and A1200.TXT
FC: no differences encountered

17/10/2021  19:24                72 a1200.txt
17/10/2021  21:49                72 x1200.txt
17/10/2021  21:49                35 x1251.txt
17/10/2021  21:49                67 x65001Bom.txt
17/10/2021  21:49                64 x65001noBom.txt
17/10/2021  21:49                35 x866.txt
17/10/2021  21:49                72 y1200.txt
17/10/2021  21:49                72 z1200.txt

总结(不完整):文件转换(⇆可逆)

直接:

  • utf-16-le-bomcp866
  • utf-16-le-bomcp1251
  • utf-16-le-bomutf-8-bom
  • utf-16-le-bomutf-8-noBom

可能(通过辅助文件):

  • cp866utf-16-le-bomcp1251
  • cp866utf-16-le-bomutf-8-bom
  • utf-8-bomutf-16-le-bomcp1251

可能utf-8-noBomutf-8-bom如下:

copy /B bomUtf8.bin + fileutf-8-noBom.txt fileutf-8-bom.txt

在 Windows10 中使用以下管理语言设置进行了测试;未使用未选中的 Beta 复选框进行测试: