SQL 服务器 HASHBYTES 和扩展 Ascii

Question

我正在 Oracle 和 SQL 服务器之间进行 ETL 处理（无主键 -> 无事务复制），并使用 MD5 散列来检测源数据库和目标数据库之间的差异。

这适用于数据属于前 127 个 ASCII 字符的那些记录。但是当有任何 'extended ascii'* 字符时，例如 ½、° 或 © SQL 服务器的 HASHBYTES 函数将这些字符散列为非-标准方式（即不同于 Oracle 的 DBMS_CRYPTO.Hash、.Net 加密库等）。

所以当我在 Oracle 中运行这个时：

select rawtohex(
DBMS_CRYPTO.Hash (
    UTL_I18N.STRING_TO_RAW ('°', 'AL32UTF8'),
    2)
) from dual;

我得到：4723EB5AA8B0CD28C7E09433839B8FAE.

当我在 SQL 服务器中运行时：

SELECT HASHBYTES('md5', '°');

我得到：EC655B6DA8B9264A7C7C5E1A70642FA7

当我运行这个 C# 代码时：

string password = "°";

// byte array representation of that string
byte[] encodedPassword = new UTF8Encoding().GetBytes(password);

// need MD5 to calculate the hash
byte[] hash = ((HashAlgorithm) CryptoConfig.CreateFromName("MD5")).ComputeHash(encodedPassword);

// string representation (similar to UNIX format)
string encoded = BitConverter.ToString(hash)
   // without dashes
   .Replace("-", string.Empty)
   // make lowercase
   .ToLower();

我得到 4723EB5AA8B0CD28C7E09433839B8FAE 即与 Oracle 和我使用过的每个在线工具相同。

这个问题是否有任何基于 SQL 的解决方案，或者我是否需要创建一个 CLR 存储过程并在那里散列数据？

*我意识到这个词有些争议

Answer 1

截至目前，MS SQL Server 不支持 UTF-8。因此，您的哈希将始终不同，直到您将源字符串切换为最常见的分母，在本例中为 UTF-16（可能）。

Answer 2

我决定通过实施一个使用 .Net 加密库的 CLR 存储过程来围绕 SQL 服务器对扩展 ASCII 的处理：

using System;
using System.Security.Cryptography;
using System.Text;
using Microsoft.SqlServer.Server;

public class Functions
{
  [SqlFunction]
  public static string GetMD5Hash (string input)
  {
    var encodedPassword = new UTF8Encoding().GetBytes(input);

    var hash = ((HashAlgorithm)CryptoConfig.CreateFromName("MD5")).ComputeHash(encodedPassword);

    return BitConverter.ToString(hash).Replace("-", string.Empty);
  }
}

SQL 服务器 HASHBYTES 和扩展 Ascii

SQL Server HASHBYTES and Extended Ascii

c#

sql-server

hash

ascii

md5