Windows 中的短文件名是如何生成的?

How are Short File Names generated in Windows?

我目前正在使用以下 P/Invoke 签名来获取常规 Windows 文件的 短文件名

[DllImport("kernel32.dll", CharSet = CharSet.Auto)]
public static extern int GetShortPathName([MarshalAs(UnmanagedType.LPTStr)] string path,
                                          [MarshalAs(UnmanagedType.LPTStr)] StringBuilder shortPath,
                                          int shortPathLength);

目前 - 它没有任何问题,但我注意到一些很奇怪的事情:
我知道 Windows 使用以下 短文件名 约定:

Cut the name to 6 characters (without extension)
Append the tilde (~)
Append an unsigned integer number which indicates the match index (starting with 1)
Append the original file extension

因此,文件名 C:\abcdefghijklmn.txt 应该可以在短名称 C:\abcdefg~1.txt 下访问。 (工作正常。)

现在是奇怪的部分:我最近在我的音乐目录中进行了一次小型搜索以查找特定的音频文件。这是结果:

.\Rammstein & Tatu - Moscow.mp3
.\Rammstein - Asche zu Asche.mp3
.\Rammstein - Der Meister.mp3
.\Rammstein - Du Hast.mp3
.\Rammstein - Eifersucht.mp3
.\Rammstein - Feuer Frei.mp3
.\Rammstein - Führe Mich.mp3
.\Rammstein - Haifisch.mp3
...

与缩写相同的搜索:

.\RA8E17~1.MP3
.\RA23A6~1.MP3
.\RAMMST~1.MP3
.\RA0CAE~1.MP3
.\RAMMST~2.MP3
.\RAMMST~3.MP3
.\RAMMST~4.MP3
.\RA6BAA~1.MP3
...

我的问题是: 为什么 windows 在波浪号之前生成这样的 "random" 前缀(如 RA23A6RA0CAE )?

Microsoft 没有对此进行记录,但维基百科有:

8.3 filename:

Although there is no compulsory algorithm for creating the 8.3 name from an LFN, Windows uses the following convention:

1.If the LFN is 8.3 uppercase, no LFN will be stored on disk at all.

  • Example: TEXTFILE.TXT

2.If the LFN is 8.3 mixed case, the LFN will store the mixed-case name, while the 8.3 name will be an uppercased version of it.

  • Example: TextFile.Txt becomes TEXTFILE.TXT.

3.If the filename contains characters not allowed in an 8.3 name (including space which was disallowed by convention though not by the APIs) or either part is too long, the name is stripped of invalid characters such as spaces and extra periods. Other characters such as + are changed to the underscore _, and uppercased. The stripped name is then truncated to the first 6 letters of its basename, followed by a tilde, followed by a single digit, followed by a period ., followed by the first 3 characters of the extension.

  • Example: TextFile1.Mine.txt becomes TEXTFI~1.TXT (or TEXTFI~2.TXT, should TEXTFI~1.TXT already exist). ver +1.2.text becomes VER_12~1.TEX.

4.Beginning with Windows 2000, if at least 4 files or folders already exist with the same initial 6 characters in their short names, the stripped LFN is instead truncated to the first 2 letters of the basename (or 1 if the basename has only 1 letter), followed by 4 hexadecimal digits derived from an undocumented hash of the filename, followed by a tilde, followed by a single digit, followed by a period ., followed by the first 3 characters of the extension.

  • Example: TextFile.Mine.txt becomes TE021F~1.TXT.

正如 Joey 提到的,文件名 的未记录哈希已经 reverse engineered

这是因为使用计数器和前缀的非常原始的方案最多只能处理一定数量的文件。随着文件数量的增加,Windows 切换到更短的前缀和散列。 Someone actually reverse-engineered the hash 以及一些解释:

In case you aren’t aware of how 8.3 file names work, here’s a quick run-down.

  • All periods other than the one separating the filename from the extension are dropped - a.testing.file.bat turns into atestingfile.bat.
  • Certain special characters like + are turned into underscores, and others are dropped. The file name is upper-cased. 1+2+3 Hello World.exe turns into 1_2_3HELLOWORLD.EXE.
  • The file extension is truncated to 3 characters, and (if longer than 8 characters) the file name is truncated to 6 characters followed by ~1. SomeStuff.aspx turns into SOMEST~1.ASP.
  • If these would cause a collision, ~2 is used instead, followed by ~3 and ~4.
  • Instead of going to ~5, the file name is truncated down to 2 characters, with the replaced replaced by a hexadecimal checksum of the long filename - SomeStuff.aspx turns into SOBC84~1.ASP, where BC84 is the result of the (previously-)undocumented checksum function.