T-SQL,将字符串切成35个字符的块,不切割单词

T-SQL, cut string into chunks of 35 characters without cutting words

好的,我一直在到处寻找正确的语法。 假设我有一个长度未知的字符串 即 'The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.'

我需要将其分成最多 35 个字符的块,例如:'The quick brown fox jumps over the' space 是分隔符。

规则是:

1 - 每个块不超过 35 个字符

2 - 不要拆分单词。

2.1 - 如果组合长度大于 35,返回到找到长度小于 35 的第一个 space 并在那里剪切。

3 - 结果集必须 return 一个 table 具有 5 个值(由字符串块组成)和一个行号,指示多个记录的结果(如果需要)。 (见下文 table)

也就是说如果一个字符串分成5个35个字符块一条记录return 任何多余的溢出到一组 5

中的更多行
________________________________________________________________________________________________________________________________________________________________________________|
|level  |   Val1                            |Val2                           |   Val3                        |   Val4                            |   Val5                        |
________________________________________________________________________________________________________________________________________________________________________________|
|   1   |The quick brown fox jumps over the | lazy dog. The quick brown fox | jumps over the lazy dog. The  | quick brown fox jumps over the    | lazy dog. The quick brown fox |
|   2   | jumps over the lazy dog. The      | quick brown fox jumps over the| lazy dog.                     |NULL                               |   NULL                        |       
________________________________________________________________________________________________________________________________________________________________________________|

我在这里找到了一些代码,但我无法将结果限制为 35 个字符块

它的作用是: 获取字符串中分隔符的数量(spaces 的数量) 而不是将所有使用 CTE 的人拆分为 table。 比全部连接起来。但是,在“Splitvalues”cte 中,如果我将 mainLevel 分成 5 个通道,它可以工作,但不是按长度连接,而是按 5 组连接 我仍然迷失了如何将结果转换为所描述的 6 列。

DECLARE     @ColumnLen             INT           = 35
       ,@BNotAllowNullinValue1 BIT           = 1
       ,@Delim                 VARCHAR(5)    = SPACE(1)
       ,@DelimCount            INT
       ,@OriginalStr           NVARCHAR(MAX) = 'The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.'
       ,@ReturnColumnCount     INT           = 5

SET @OriginalStr = @OriginalStr + @Delim
SET @DelimCount = ((LEN(@OriginalStr + '|')-1) - (LEN(REPLACE(@OriginalStr + '|',@Delim,''))-1)) / LEN(LTRIM(RTRIM(@Delim)) + '|')

---- test data
;WITH Splitvalues(SplitValue ,MainLevel ,ProcessLevel,LastPos,Original)
AS (SELECT TOP 1 LTRIM(RTRIM(SUBSTRING(@OriginalStr,1,ABS(CHARINDEX(@Delim,@OriginalStr,1)))))
                ,1 as MainLevel
                ,1 as ProcessLevel
                ,CHARINDEX(@Delim,@OriginalStr,1 + 1) AS LastPos
                ,@OriginalStr
    UNION ALL
    SELECT LTRIM(RTRIM(SUBSTRING(@OriginalStr,LastPos + 1,ABS((CHARINDEX(@Delim,@OriginalStr,LastPos + 1) - LastPos)))))
          ,CASE (ProcessLevel % 5) WHEN  0 THEN MainLevel +1 ELSE MainLevel END  as MainLevel
          ,ProcessLevel + 1 as ProcessLevel
         ,CHARINDEX(@Delim,@OriginalStr,LastPos + 1) AS LastPos
          ,@OriginalStr
    FROM Splitvalues
    WHERE ProcessLevel <= @DelimCount
          AND ISNULL(LTRIM(RTRIM(SUBSTRING(@OriginalStr,LastPos + 1,ABS((CHARINDEX(@Delim,@OriginalStr,LastPos + 1) - LastPos))))),'') <> '')
---- actual query;
,cte(MainLevel,ProcessLevel,combined,rn)
     AS (SELECT MainLevel,ProcessLevel,Splitvalue ,rn = ROW_NUMBER() OVER(PARTITION BY MainLevel ORDER BY MainLevel,ProcessLevel)FROM Splitvalues)
,cte2(MainLevel,ProcessLevel ,finalstatus ,rn)
     AS (SELECT MainLevel,cte.ProcessLevel ,CONVERT(VARCHAR(MAX),combined)  ,1 FROM cte WHERE rn = 1
         UNION ALL
         SELECT cte2.MainLevel,cte2.ProcessLevel  +1 ,CONVERT(VARCHAR(MAX),cte2.finalstatus + @Delim + cte.combined+ @Delim )
               ,cte2.rn + 1
         FROM cte2
         INNER JOIN cte ON cte.MainLevel = cte2.MainLevel  AND cte.rn = cte2.rn + 1
        )
     SELECT MainLevel,MAX(finalstatus),LEN(MAX(finalstatus)+'|')
     FROM cte2
     GROUP BY MainLevel

感谢大家的帮助。

要找到要剪切的地方,我建议获取字符串的前 35 个字符,然后找到最后一个 space。这可以使用 reverse 和 charindex 来完成:

最右边space的位置是:

36-CHARINDEX(' ', REVERSE(LEFT(@txt, 36)))

现在我会使用递归 CTE。每个级别都会切割字符串的下一位,直到它为空。

我已经包含了下面的查询,我在其中使用上面的位置查找器来锚定 CTE,在递归部分,我再次使用位置查找器。

当没有剩余字符时,递归结束。

最后,为了将其设置为连续 5 列,我将每次出现的次数编号为 0...n

然后我使用事实比 n%5 模数在 (0,1,2,3,4) 并且 n/5 作为整数除法给出了应该输出该列的行号。

declare @txt varchar(max)=N'The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.'
;
WITH cte
AS
(SELECT
        TRIM(LEFT(@txt, n)) grp
       ,0 grpn
       ,TRIM(SUBSTRING(@txt, n + 1, LEN(@txt))) remainder
    FROM (SELECT
            36 - CHARINDEX(' ', REVERSE(LEFT(@txt, 36))) n) a
    UNION ALL
    SELECT
        TRIM(LEFT(remainder, (n))) grp
       ,grpn + 1
       ,TRIM(SUBSTRING(remainder, (n) + 1, LEN(remainder))) remainder
    FROM cte
    OUTER APPLY (SELECT
            36 - CHARINDEX(' ', REVERSE(LEFT(remainder, 36))) n) a
    WHERE LEN(remainder) > 0)
SELECT
    max(iif(grpn%5=0,grp,null)) Val1
    ,max(iif(grpn%5=1,grp,null)) Val2
    ,max(iif(grpn%5=2,grp,null)) Val3
    ,max(iif(grpn%5=3,grp,null)) Val4
    ,max(iif(grpn%5=4,grp,null)) Val5
FROM cte
group by grpn/5

如果您的 MS Sql 服务器版本是 2017 或更高版本,那么您可以为此使用 STRING_SPLIT & STRING_AGG。

示例:

declare @OriginalStr nvarchar(max);
set @OriginalStr = N'The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog';

declare @ColumnLen int = 35;

declare @parts table 
(
  part_id int identity(1,1) primary key,
  part_content nvarchar(max)
);

declare @lines table 
(
  line_id int primary key,
  line_content nvarchar(max)
);

-- splitting the string on the spaces
insert into @parts (part_content)
select value
from string_split(@OriginalStr, ' ') spl

-- glueing the parts back together
insert into @lines (line_id, line_content)
select 
 lineNr,
 string_agg(part_content, ' ') as line
from
(
  select part_content
  , floor(1.0*(sum(len(part_content)+1) 
               over (order by part_id))/(@ColumnLen-1))+1 as lineNr
  from @parts 
) q
group by lineNr;

-- pivoting the lines
select
ceiling((line_id-0.1)/5) as [Level],
max(case when line_id%5 = 1 then line_content end) as Val1,
max(case when line_id%5 = 2 then line_content end) as Val2,
max(case when line_id%5 = 3 then line_content end) as Val3,
max(case when line_id%5 = 4 then line_content end) as Val4,
max(case when line_id%5 = 0 then line_content end) as Val5
from @lines l
group by ceiling((line_id-0.1)/5)
order by [Level];

GO
Level | Val1                              | Val2                              | Val3                               | Val4                               | Val5                          
:---- | :-------------------------------- | :-------------------------------- | :--------------------------------- | :--------------------------------- | :-----------------------------
1     | The quick brown fox jumps over    | the lazy dog. The quick brown fox | jumps over the lazy dog. The quick | brown fox jumps over the lazy dog. | The quick brown fox jumps over
2     | the lazy dog. The quick brown fox | jumps over the lazy dog           | null                               | null                               | null                          

db<>fiddle here

在旧版本的 Sql 服务器上,这应该可以填充行的 table 变量。

with rcte as
(
   select 
   1 as lineNr,
   1 as strPos,
   @ColumnLen + 1 - cast(
     charindex(N' ',
       reverse(
         substring(@OriginalStr, 1, @ColumnLen)
       ) 
     ) as int) as lineLen
   
   union all
   
   select
   lineNr + 1,
   strPos + lineLen,
   @ColumnLen + 1 - cast(
     charindex(N' ',
       reverse(
         substring(@OriginalStr, strPos+lineLen, @ColumnLen)
       )
     ) as int)
   from rcte
   where strPos+lineLen < len(@OriginalStr)
)
insert into @lines (line_id, line_content)
select 
 lineNr,
 line = rtrim(substring(@OriginalStr, pos, lineLen))
from rcte;