找到数据库中的所有 nvarchar 字段并对它们进行替换（<field>，"CHAR(10)"，''）

Question

我通过 xml 文件获取数据。我为此使用第三方组件。（Zapsys，我与他们没有任何关系，但也许有人知道他们的产品） XML 中的数据如下所示：

<customer>
"Johnny"
</customer>

我最终在 table（客户）中得到的是具有以下内容的 nvarchar（姓氏）：

CHAR(10)JohnnyCHAR(10)

这是从 XML 读取的每个 nvarchar 字段中的。该组件实际上确实提取了它读取的内容。但是这些字符弄乱了很多语句。

select * from customers where surname = 'Johnny' 没有结果。

select * from customers where surname like '%Johnny%'

或

select * from customers where replace(surname,char(10),'') = 'Johnny 做。

不是很漂亮。

解决这个问题的一种方法是使用带有大量替换语句的视图。但是，如果我可以运行一个从每个 nvarchar 字段中擦除这些 CHAR(10) 的过程，那不是很好吗？

必须可以编写一个更新语句来查找所有 nvarchar 字段并对它们执行 replace(,"CHAR(10)",'') 吗？

更清楚一点：我知道更新语句是如何工作的。我正在寻找一种方法来避免为 (n)varchar

类型的数据库中的每个字段编写更新语句

更新：

根据@matt 的建议想出了这段代码（参见标记为解决方案的答案）

declare @temptable table (id 

    int identity(1,1), sql nvarchar(4000))

        insert into @temptable(sql)
        SELECT 'UPDATE '+quotename(i.TABLE_SCHEMA)+'.'+quotename(i.TABLE_NAME) +' SET 
        '+quotename(i.COLUMN_NAME)+' = REPLACE('+quotename(i.COLUMN_NAME)+', CHAR(10),'''')'
        FROM INFORMATION_SCHEMA.COLUMNS  i
        inner join sys.tables t on i.TABLE_NAME = t.name
        WHERE DATA_TYPE = 'NVARCHAR' 
        and t.type = 'U'
        and TABLE_SCHEMA = 'myschema'


        declare @i as int = 1
        declare @sql as nvarchar(max)
        declare @max as int = (select max(id) from @temptable)

        while @i <= @max
        BEGIN
        set @sql = (select [sql] from @temptable where id = @i)
        exec sp_executesql @sql
        --print cast(@i as varchar(5)) + '/'+cast(@max as varchar(5)) + ' done, ' +cast(@max-@i as varchar(5)) + ' to go...'
        set @sql = ''
        set @i = @i+1
        END

Answer 1

当然，您可以运行在导入过程中对该姓氏字段进行更新。这样的东西对你有用：

UPDATE customers
SET surname = replace(surname,char(10),'')

或者您可以像这样使用一些动态 SQL 来生成更新语句，您可以快速更改它以便它执行：

SELECT 'UPDATE '+TABLE_CATALOG+'.'+TABLE_SCHEMA+'.'+COLUMN_NAME+' SET 
'+COLUMN_NAME+' = REPLACE('+COLUMN_NAME+', CHAR(10),'''')'
FROM INFORMATION_SCHEMA.COLUMNS
WHERE DATA_TYPE = 'NVARCHAR'

Answer 2

这应该会为您提供一个列列表，以围绕这些列构建游标：

select COLUMN_NAME
from INFORMATION_SCHEMA.COLUMNS
where DATA_TYPE in ('varchar','nvarchar')
    and TABLE_NAME = [your table name]

这个工作起来更流畅。

Answer 3

首先你需要一个好的 N-Grams function such as the one covered here. The version I am including below is the NVARCHAR(4000) version (Kudos to Larnu for his contribution.) I used NGramsN4K to build a NVARCHAR(4000) PatReplace 函数。我为我的函数使用不同的模式，但 dbo 可以正常工作。

请注意：

SELECT pr.NewString 
FROM   samd.patReplaceN4K('ൈൈƐABCƐƐ123ˬˬˬˬXYZˤˤ','[^0-9a-zA-Z]','') AS pr;

Returns: ABC123XYZ

不匹配此模式的所有字符：[^0-9a-zA-Z] 已被排除。现在让我们针对包含不良字符的记录使用该函数，删除它们，然后将它们连接到具有良好值的 table。注意我的评论。

-- Sample data
DECLARE @Customers  TABLE (CustomerId INT IDENTITY, Surname NVARCHAR(100));
DECLARE @GoodValues TABLE (Surname NVARCHAR(100));

INSERT @Customers  (Surname) VALUES (CHAR(10)+'Johnny'+CHAR(10)),('Smith'),('Jones'+CHAR(160));
INSERT @goodvalues (Surname) VALUES('Johnny'),('Smith'),('Jones'),('James');

-- Fail:
SELECT c.CustomerId, g.Surname
FROM   @Customers  AS c
JOIN   @GoodValues AS g 
  ON   c.Surname = g.Surname;

-- Success:
SELECT c.CustomerId, g.Surname
FROM        @Customers  AS c
CROSS APPLY samd.patreplaceN4K(c.Surname,'[^0-9a-zA-Z ]','') AS pr
JOIN        @GoodValues AS g 
  ON        pr.newString = g.Surname;

samd.NGramsN4K

CREATE FUNCTION samd.NGramsN4K
(
  @string NVARCHAR(4000), -- Input string
  @N      INT             -- requested token size
)
/*****************************************************************************************
[Purpose]:
 A character-level N-Grams function that outputs a contiguous stream of @N-sized tokens 
 based on an input string (@string). Accepts strings up to 4000 NVARCHAR characters long.
 For more information about N-Grams see: http://en.wikipedia.org/wiki/N-gram. 

[Author]:
  Alan Burstein

[Compatibility]:
 SQL Server 2008+, Azure SQL Database

[Syntax]:
--===== Autonomous
 SELECT ng.position, ng.token
 FROM   samd.NGramsN4K(@string,@N) AS ng;

--===== Against a table using APPLY
 SELECT      s.SomeID, ng.position, ng.token
 FROM        dbo.SomeTable                  AS s
 CROSS APPLY samd.NGramsN4K(s.SomeValue,@N) AS ng;

[Parameters]:
 @string  = The input string to split into tokens.
 @N       = The size of each token returned.

[Returns]:
 Position = bigint; the position of the token in the input string
 token    = NVARCHAR(4000); a @N-sized character-level N-Gram token

[Dependencies]:
 1. core.rangeAB (iTVF)

[Developer Notes]:
 1. NGramsN4K is not case sensitive

 2. Many functions that use NGramsN4K will see a huge performance gain when the optimizer
    creates a parallel execution plan. One way to get a parallel query plan (if the 
    optimizer does not chose one) is to use make_parallel by Adam Machanic which can be 
    found here:
 sqlblog.com/blogs/adam_machanic/archive/2013/07/11/next-level-parallel-plan-porcing.aspx

 3. When @N is less than 1 or greater than the datalength of the input string then no 
    tokens (rows) are returned. If either @string or @N are NULL no rows are returned.
    This is a debatable topic but the thinking behind this decision is that: because you
    can't split 'xxx' into 4-grams, you can't split a NULL value into unigrams and you 
    can't turn anything into NULL-grams, no rows should be returned.

    For people who would prefer that a NULL input forces the function to return a single
    NULL output you could add this code to the end of the function:

    UNION ALL 
    SELECT 1, NULL
    WHERE NOT(@N > 0 AND @N <= DATALENGTH(@string)) OR (@N IS NULL OR @string IS NULL);

 4. NGramsN4K is deterministic. For more about deterministic functions see:
    https://msdn.microsoft.com/en-us/library/ms178091.aspx

[Examples]:
--===== 1. Turn the string, 'ɰɰXɰɰ' into unigrams, bigrams and trigrams
 DECLARE @string NVARCHAR(4000) = N'ɰɰXɰɰ';
 BEGIN
   SELECT ng.Position, ng.Token FROM samd.NGramsN4K(@string,1) AS ng; -- unigrams (@N=1)
   SELECT ng.Position, ng.Token FROM samd.NGramsN4K(@string,2) AS ng; -- bigrams  (@N=2)
   SELECT ng.Position, ng.Token FROM samd.NGramsN4K(@string,3) AS ng; -- trigrams (@N=3)
   SELECT ng.Position, ng.Token FROM samd.NGramsN4K(@string,4) AS ng; -- 4-grams  (@N=4)
 END

--===== 2. Scenarios where the function would not return rows
 SELECT ng.Position, ng.Token FROM samd.NGramsN4K('abcd',5)   AS ng; -- 5-grams  (@N=5)
 SELECT ng.Position, ng.Token FROM samd.NGramsN4K(N'x', 0)    AS ng;
 SELECT ng.Position, ng.Token FROM samd.NGramsN4K(N'x', NULL) AS ng;

 This will fail:
 --SELECT ng.Position, ng.Token FROM samd.NGramsN4K(N'x',-1)    AS ng;

--===== 3. How many times the substring "ƒƓ" appears in each record
 BEGIN
   DECLARE @table TABLE(stringID int identity primary key, string NVARCHAR(100));
   INSERT @table(string)
   VALUES (N'ƒƓ123ƒƓ'),(N'123ƒƓƒƓƒƓ'),(N'!ƒƓ!ƒƓ!'),(N'ƒƓ-ƒƓ-ƒƓ-ƒƓ-ƒƓ');

   SELECT t.String, Occurances = COUNT(*) 
   FROM @table                            AS t
   CROSS APPLY samd.NGramsN4K(t.string,2) AS ng
   WHERE       ng.token = N'ƒƓ'
   GROUP BY    t.string;
 END;
-----------------------------------------------------------------------------------------
[Revision History]:
 Rev 00 - 20170324 - Initial Development - Alan Burstein
 Rev 01 - 20180829 - Changed TOP logic and startup-predicate logic in the WHERE clause
                   - Alan Burstein
 Rev 02 - 20191129 - Redesigned to leverage rangeAB - Alan Burstein
 Rev 03 - 20200416 - changed the cast from NCHAR(4000) to NVARCHAR(4000)
                   - Removed: WHERE @N BETWEEN 1 AND s.Ln; this must now be handled
                     manually moving forward. - Alan Burstein
*****************************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT
  Position = r.RN,                                              -- Token Position
  Token    = CAST(SUBSTRING(@string,r.RN,@N) AS NVARCHAR(4000)) -- @N-Sized Token
FROM        (VALUES(DATALENGTH(ISNULL(NULLIF(@string,N''),N'X'))/2)) AS s(Ln)
CROSS APPLY core.rangeAB(1,s.Ln-(ISNULL(@N,1)-1),1,1)                AS r
GO

samd.patReplaceN4K

CREATE FUNCTION samd.patReplaceN4K
(
  @string  NVARCHAR(4000), -- Input String
  @pattern NVARCHAR(50),   -- Pattern to match/replace
  @replace NVARCHAR(20)    -- What to replace the matched pattern with
)
/*****************************************************************************************
[Purpose]:
 Given a string (@string), a pattern (@pattern), and a replacement character (@replace)
 patReplaceN4K will replace any character in @string that matches the @Pattern parameter 
 with the character, @replace.

[Author]:
 Alan Burstein

[Compatibility]:
  SQL Server 2008+

[Syntax]:
--===== Basic Syntax Example
 SELECT pr.NewString
 FROM   samd.patReplaceN4K(@String,@Pattern,@Replace) AS pr;

[Parameters]:
 @string  = NVARCHAR(4000); The input string to manipulate
 @pattern = NVARCHAR(50);   The pattern to match/replace
 @replace = NVARCHAR(20);   What to replace the matched pattern with

[Returns]:
 Inline Table Valued Function returns:
 NewString = NVARCHAR(4000); The new string with all instances of @Pattern replaced with
             The value of @Replace.

[Dependencies]:
 core.ngramsN4k (ITVF)

[Developer Notes]:
 1. @Pattern IS case sensitive but can be easily modified to make it case insensitive
 2. There is no need to include the "%" before and/or after your pattern since since we 
    are evaluating each character individually
 3. Certain special characters, such as "$" and "%" need to be escaped with a "/"
    like so: [/$/%]
 4. As is the case with functions which leverage samd.ngrams or samd.ngramsN4k, 
    samd.patReplaceN4K is almost always dramatically faster with a parallel execution
    plan. One way to get a parallel query plan (if the optimizer does not choose one) is
    to use make_parallel by Adam Machanic found here:
  sqlblog.com/blogs/adam_machanic/archive/2013/07/11/next-level-parallel-plan-porcing.aspx

    On my PC (8 logical CPU, 64GB RAM, SQL 2019) samd.patReplaceN4K is about 4X
    faster when executed using all 8 of my logical CPUs. 
 5. samd.patReplaceN4K is deterministic. For more about deterministic functions see:
    https://msdn.microsoft.com/en-us/library/ms178091.aspx

[Examples]:
--===== 1. Remove non alphanumeric characters
 SELECT pr.NewString 
 FROM   samd.patReplaceN4K('ൈൈƐABCƐƐ123ˬˬˬˬXYZˤˤ','[^0-9a-zA-Z]','') AS pr;

--===== 2. Replace numeric characters with a "*"
 SELECT pr.NewString
 FROM  samd.patReplaceN4K('My phone number is 555-2211','[0-9]','*') AS pr;

--==== 3. Using againsts a table
 DECLARE @table TABLE(OldString varchar(60));
 INSERT  @table VALUES ('Call me at 555-222-6666'), ('phone number: (312)555-2323'),
                       ('He can be reached at 444.665.4466 on Monday.');

 SELECT      t.OldString, pr.NewString
 FROM        @table                                     AS t
 CROSS APPLY samd.patReplaceN4K(t.oldstring,'[0-9]','*') AS pr;

[Revision History]:
-----------------------------------------------------------------------------------------
Rev 01  - 20200422 - Created - Alan Burstein
*****************************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT newString = 
(
  SELECT CASE WHEN @string = a.Blank THEN a.Blank ELSE
           CASE WHEN PATINDEX(@pattern,a.Token)&0x01=0 THEN ng.token ELSE @replace END END
  FROM        samd.NGramsN4K(@string,1) AS ng
  CROSS APPLY (VALUES(CAST('' AS NVARCHAR(4000)),
                      ng.token COLLATE Latin1_General_BIN)) AS a(Blank,Token)
  ORDER BY ng.position
  FOR XML PATH(''),TYPE
).value('text()[1]', 'NVARCHAR(4000)');
GO

找到数据库中的所有 nvarchar 字段并对它们进行替换（<field>，"CHAR(10)"，''）

find all nvarchar fields in database and do a replace(<field>,"CHAR(10)",'') on them

tsql

ssis

azure-sql-database

azure-data-factory-2