读取 python 中作为 Powershell 脚本输出的文本文件会产生意外结果？

Question

我有一个 Powershell 脚本，它可以做很多事情，并最终将一个具有整数值的变量写入文本文件。下面是一个简化的例子：

$theValue = 531231245
$theValue | Out-File .\Test.txt

我也试过添加ToString()方法：

$theValue = 531231245
$theValue.ToString() | Out-File .\Test.txt

它生成一个文本文件，当我双击它时，没有任何意外。我在文本文件中的两种情况下都看到 theValue，显然是数值。

然而，我随后尝试在 python 中阅读它，但它产生了一个奇怪的结果

with open("Test.txt", 'r') as FID: 
    theText = FID.read()
print(theText)

则输出为：

Output : ÿþ5 3 1 2 3 1 2 4 5

这实际上是最不奇怪的输出，因为我收到了一些看起来像字节编码的奇怪字符串。我尝试了 decode、readlines 和许多其他方法。

我不明白为什么我不能从文本文件中正确读取简单的字符串。有什么想法吗？

Answer 1

ÿþ 是 Unicode 65279 字符。您可以像这样删除 unicode 字符：

with open("Test.txt", 'r') as FID: 
    theText = FID.read()
    string_encode = theText.encode("ascii", "ignore")
    string_decode = string_encode.decode()

    # output: 5 3 1 2 3 1 2 4 5
    print(string_decode)

Answer 2

在 Windows PowerShell 中，默认情况下 Out-File cmdlet produces UTF-16LE（“Unicode”）文件及其有效别名， >
- PowerShell (Core) 7+，相比之下，幸运的是现在始终默认为 BOM-less UTF-8。
因此，您有两个选择：
- 使用Out-File的/Set-Content的-Encoding参数以Python默认识别的字符编码生成文件。
- 使用open()函数的encoding参数匹配PowerShell生成的编码；对于 Windows PowerShell:
```
with open("t.txt", 'r', encoding='utf-16le') as FID: 
  theText = FID.read()
print(theText)
```

读取 python 中作为 Powershell 脚本输出的文本文件会产生意外结果？

Reading text file in python which was the output of a Powershell script produces unexpected results?

python

powershell