Windows 中的短文件名是如何生成的?
How are Short File Names generated in Windows?
我目前正在使用以下 P/Invoke 签名来获取常规 Windows 文件的 短文件名 :
[DllImport("kernel32.dll", CharSet = CharSet.Auto)]
public static extern int GetShortPathName([MarshalAs(UnmanagedType.LPTStr)] string path,
[MarshalAs(UnmanagedType.LPTStr)] StringBuilder shortPath,
int shortPathLength);
目前 - 它没有任何问题,但我注意到一些很奇怪的事情:
我知道 Windows 使用以下 短文件名 约定:
Cut the name to 6 characters (without extension)
Append the tilde (~
)
Append an unsigned integer number which indicates the match index (starting with 1)
Append the original file extension
因此,文件名 C:\abcdefghijklmn.txt
应该可以在短名称 C:\abcdefg~1.txt
下访问。 (工作正常。)
现在是奇怪的部分:我最近在我的音乐目录中进行了一次小型搜索以查找特定的音频文件。这是结果:
.\Rammstein & Tatu - Moscow.mp3
.\Rammstein - Asche zu Asche.mp3
.\Rammstein - Der Meister.mp3
.\Rammstein - Du Hast.mp3
.\Rammstein - Eifersucht.mp3
.\Rammstein - Feuer Frei.mp3
.\Rammstein - Führe Mich.mp3
.\Rammstein - Haifisch.mp3
...
与缩写相同的搜索:
.\RA8E17~1.MP3
.\RA23A6~1.MP3
.\RAMMST~1.MP3
.\RA0CAE~1.MP3
.\RAMMST~2.MP3
.\RAMMST~3.MP3
.\RAMMST~4.MP3
.\RA6BAA~1.MP3
...
我的问题是: 为什么 windows 在波浪号之前生成这样的 "random" 前缀(如 RA23A6
或 RA0CAE
)?
Microsoft 没有对此进行记录,但维基百科有:
Although there is no compulsory algorithm for creating the 8.3 name from an LFN, Windows uses the following convention:
1.If the LFN is 8.3 uppercase, no LFN will be stored on disk at all.
- Example:
TEXTFILE.TXT
2.If the LFN is 8.3 mixed case, the LFN will store the mixed-case name, while the 8.3 name will be an uppercased version of it.
- Example:
TextFile.Txt
becomes TEXTFILE.TXT
.
3.If the filename contains characters not allowed in an 8.3 name (including space which was disallowed by convention though not by the APIs) or either part is too long, the name is stripped of invalid characters such as spaces and extra periods. Other characters such as +
are changed to the underscore _
, and uppercased. The stripped name is then truncated to the first 6 letters of its basename, followed by a tilde, followed by a single digit, followed by a period .
, followed by the first 3 characters of the extension.
- Example:
TextFile1.Mine.txt
becomes TEXTFI~1.TXT
(or TEXTFI~2.TXT
, should TEXTFI~1.TXT
already exist). ver +1.2.text
becomes VER_12~1.TEX
.
4.Beginning with Windows 2000, if at least 4 files or folders already exist with the same initial 6 characters in their short names, the stripped LFN is instead truncated to the first 2 letters of the basename (or 1 if the basename has only 1 letter), followed by 4 hexadecimal digits derived from an undocumented hash of the filename, followed by a tilde, followed by a single digit, followed by a period .
, followed by the first 3 characters of the extension.
- Example:
TextFile.Mine.txt
becomes TE021F~1.TXT
.
正如 Joey 提到的,文件名 的未记录哈希已经 reverse engineered。
这是因为使用计数器和前缀的非常原始的方案最多只能处理一定数量的文件。随着文件数量的增加,Windows 切换到更短的前缀和散列。 Someone actually reverse-engineered the hash 以及一些解释:
In case you aren’t aware of how 8.3 file names work, here’s a quick
run-down.
- All periods other than the one separating the filename from the extension are dropped - a.testing.file.bat turns into
atestingfile.bat.
- Certain special characters like + are turned into underscores, and others are dropped. The file name is upper-cased. 1+2+3 Hello
World.exe turns into 1_2_3HELLOWORLD.EXE.
- The file extension is truncated to 3 characters, and (if longer than 8 characters) the file name is truncated to 6 characters followed
by ~1. SomeStuff.aspx turns into SOMEST~1.ASP.
- If these would cause a collision, ~2 is used instead, followed by ~3 and ~4.
- Instead of going to ~5, the file name is truncated down to 2 characters, with the replaced replaced by a hexadecimal checksum of
the long filename - SomeStuff.aspx turns into SOBC84~1.ASP, where BC84
is the result of the (previously-)undocumented checksum function.
我目前正在使用以下 P/Invoke 签名来获取常规 Windows 文件的 短文件名 :
[DllImport("kernel32.dll", CharSet = CharSet.Auto)]
public static extern int GetShortPathName([MarshalAs(UnmanagedType.LPTStr)] string path,
[MarshalAs(UnmanagedType.LPTStr)] StringBuilder shortPath,
int shortPathLength);
目前 - 它没有任何问题,但我注意到一些很奇怪的事情:
我知道 Windows 使用以下 短文件名 约定:
Cut the name to 6 characters (without extension)
Append the tilde (~
)
Append an unsigned integer number which indicates the match index (starting with 1)
Append the original file extension
因此,文件名 C:\abcdefghijklmn.txt
应该可以在短名称 C:\abcdefg~1.txt
下访问。 (工作正常。)
现在是奇怪的部分:我最近在我的音乐目录中进行了一次小型搜索以查找特定的音频文件。这是结果:
.\Rammstein & Tatu - Moscow.mp3
.\Rammstein - Asche zu Asche.mp3
.\Rammstein - Der Meister.mp3
.\Rammstein - Du Hast.mp3
.\Rammstein - Eifersucht.mp3
.\Rammstein - Feuer Frei.mp3
.\Rammstein - Führe Mich.mp3
.\Rammstein - Haifisch.mp3
...
与缩写相同的搜索:
.\RA8E17~1.MP3
.\RA23A6~1.MP3
.\RAMMST~1.MP3
.\RA0CAE~1.MP3
.\RAMMST~2.MP3
.\RAMMST~3.MP3
.\RAMMST~4.MP3
.\RA6BAA~1.MP3
...
我的问题是: 为什么 windows 在波浪号之前生成这样的 "random" 前缀(如 RA23A6
或 RA0CAE
)?
Microsoft 没有对此进行记录,但维基百科有:
Although there is no compulsory algorithm for creating the 8.3 name from an LFN, Windows uses the following convention:
1.If the LFN is 8.3 uppercase, no LFN will be stored on disk at all.
- Example:
TEXTFILE.TXT
2.If the LFN is 8.3 mixed case, the LFN will store the mixed-case name, while the 8.3 name will be an uppercased version of it.
- Example:
TextFile.Txt
becomesTEXTFILE.TXT
.3.If the filename contains characters not allowed in an 8.3 name (including space which was disallowed by convention though not by the APIs) or either part is too long, the name is stripped of invalid characters such as spaces and extra periods. Other characters such as
+
are changed to the underscore_
, and uppercased. The stripped name is then truncated to the first 6 letters of its basename, followed by a tilde, followed by a single digit, followed by a period.
, followed by the first 3 characters of the extension.
- Example:
TextFile1.Mine.txt
becomesTEXTFI~1.TXT
(orTEXTFI~2.TXT
, shouldTEXTFI~1.TXT
already exist).ver +1.2.text
becomesVER_12~1.TEX
.4.Beginning with Windows 2000, if at least 4 files or folders already exist with the same initial 6 characters in their short names, the stripped LFN is instead truncated to the first 2 letters of the basename (or 1 if the basename has only 1 letter), followed by 4 hexadecimal digits derived from an undocumented hash of the filename, followed by a tilde, followed by a single digit, followed by a period
.
, followed by the first 3 characters of the extension.
- Example:
TextFile.Mine.txt
becomesTE021F~1.TXT
.
正如 Joey 提到的,文件名 的未记录哈希已经 reverse engineered。
这是因为使用计数器和前缀的非常原始的方案最多只能处理一定数量的文件。随着文件数量的增加,Windows 切换到更短的前缀和散列。 Someone actually reverse-engineered the hash 以及一些解释:
In case you aren’t aware of how 8.3 file names work, here’s a quick run-down.
- All periods other than the one separating the filename from the extension are dropped - a.testing.file.bat turns into atestingfile.bat.
- Certain special characters like + are turned into underscores, and others are dropped. The file name is upper-cased. 1+2+3 Hello World.exe turns into 1_2_3HELLOWORLD.EXE.
- The file extension is truncated to 3 characters, and (if longer than 8 characters) the file name is truncated to 6 characters followed by ~1. SomeStuff.aspx turns into SOMEST~1.ASP.
- If these would cause a collision, ~2 is used instead, followed by ~3 and ~4.
- Instead of going to ~5, the file name is truncated down to 2 characters, with the replaced replaced by a hexadecimal checksum of the long filename - SomeStuff.aspx turns into SOBC84~1.ASP, where BC84 is the result of the (previously-)undocumented checksum function.