如何在视图中获取列级依赖项

How to get column-level dependencies in a view

我已经对此事进行了一些研究,但还没有解决方案。我想要得到的是视图中的列级依赖项。所以,假设我们有一个像这样的 table

create table TEST(
    first_name varchar(10),
    last_name varchar(10),
    street varchar(10),
    number int
)

和这样的视图:

create view vTEST
as
    select
        first_name + ' ' + last_name as [name],
        street + ' ' + cast(number as varchar(max)) as [address]
    from dbo.TEST

我想要得到这样的结果:

column_name depends_on_column_name depends_on_table_name
----------- --------------------- --------------------
name        first_name            dbo.TEST
name        last_name             dbo.TEST
address     street                dbo.TEST
address     number                dbo.TEST

我试过 sys.dm_sql_referenced_entities 函数,但是 referencing_minor_id 总是 0 的视图。

select
    referencing_minor_id,
    referenced_schema_name + '.' + referenced_entity_name as depends_on_table_name,
    referenced_minor_name as depends_on_column_name
from sys.dm_sql_referenced_entities('dbo.vTEST', 'OBJECT')

referencing_minor_id depends_on_table_name depends_on_column_name
-------------------- --------------------- ----------------------
0                    dbo.TEST              NULL
0                    dbo.TEST              first_name
0                    dbo.TEST              last_name
0                    dbo.TEST              street
0                    dbo.TEST              number

sys.sql_expression_dependencies and for obsolete sys.sql_dependencies也是如此。

那么我是不是漏掉了什么或者是不可能做到的?

有一些相关问题 (Find the real column name of an alias used in a view?),但正如我所说 - 我还没有找到可行的解决方案。

编辑 1:我尝试使用 DAC 查询此信息是否存储在 System Base Tables 中的某处,但没有找到它

所有你需要的都提到了视图的定义。

因此我们可以通过以下步骤提取此信息:-

  1. 将视图定义赋给字符串变量。

  2. 用(,)逗号分隔。

  3. 通过将 CROSS APPLY 与 XML.

  4. 一起使用,用 (+) 加运算符拆分别名
  5. 使用系统 table 获取与原始 table 类似的准确信息。

演示:-

Create PROC psp_GetLevelDependsView (@sViewName varchar(200))
AS
BEGIN

    Declare @stringToSplit nvarchar(1000),
            @name NVARCHAR(255),
            @dependsTableName NVARCHAR(50),
            @pos INT

    Declare @returnList TABLE ([Name] [nvarchar] (500))

    SELECT TOP 1 @dependsTableName= table_schema + '.'+  TABLE_NAME
    FROM    INFORMATION_SCHEMA.VIEW_COLUMN_USAGE

    select @stringToSplit = definition
    from sys.objects     o
    join sys.sql_modules m on m.object_id = o.object_id
    where o.object_id = object_id( @sViewName)
     and o.type = 'V'

     WHILE CHARINDEX(',', @stringToSplit) > 0
     BEGIN
        SELECT @pos  = CHARINDEX(',', @stringToSplit)  
        SELECT @name = SUBSTRING(@stringToSplit, 1, @pos-1)

        INSERT INTO @returnList 
        SELECT @name

        SELECT @stringToSplit = SUBSTRING(@stringToSplit, @pos+1, LEN(@stringToSplit)-@pos)
     END

     INSERT INTO @returnList
     SELECT @stringToSplit

    select COLUMN_NAME  ,  b.Name as Expression
    Into #Temp
    FROM INFORMATION_SCHEMA.COLUMNS a , @returnList b
    WHERE TABLE_NAME= @sViewName
    And (b.Name) like '%' + ( COLUMN_NAME) + '%'

    SELECT A.COLUMN_NAME as column_name,  
         Split.a.value('.', 'VARCHAR(100)') AS depends_on_column_name ,   @dependsTableName as depends_on_table_name
         Into #temp2
     FROM  
     (
         SELECT COLUMN_NAME,  
             CAST ('<M>' + REPLACE(Expression, '+', '</M><M>') + '</M>' AS XML) AS Data  
         FROM  #Temp
     ) AS A CROSS APPLY Data.nodes ('/M') AS Split(a); 

    SELECT b.column_name , a.COLUMN_NAME as depends_on_column_name , b.depends_on_table_name
    FROM INFORMATION_SCHEMA.VIEW_COLUMN_USAGE a , #temp2 b
    WHERE VIEW_NAME= @sViewName
    and b.depends_on_column_name  like '%' + a.COLUMN_NAME + '%'

     drop table #Temp
     drop table #Temp2

 END

测试:-

exec psp_GetLevelDependsView 'vTest'

结果:-

column_name depends_on_column_name depends_on_table_name
----------- --------------------- --------------------
name        first_name            dbo.TEST
name        last_name             dbo.TEST
address     street                dbo.TEST
address     number                dbo.TEST

此解决方案只能部分回答您的问题。它不适用于表达式列。

您可以使用 sys.dm_exec_describe_first_result_set 获取列信息:

@include_browse_information

If set to 1, each query is analyzed as if it has a FOR BROWSE option on the query. Additional key columns and source table information are returned.

CREATE TABLE txu(id INT, first_name VARCHAR(10), last_name VARCHAR(10));
CREATE TABLE txd(id INT, id_fk INT, address VARCHAR(100));

CREATE VIEW v_txu
AS
SELECT t.id AS PK_id,
       t.first_name  AS name,
       d.address,
       t.first_name + t.last_name AS name_full
FROM txu t
JOIN txd d
  ON t.id = d.id_fk

主查询:

SELECT name, source_database, source_schema,
      source_table, source_column 
FROM sys.dm_exec_describe_first_result_set(N'SELECT * FROM v_txu', null, 1) ;  

输出:

+-----------+--------------------+---------------+--------------+---------------+
|   name    |   source_database  | source_schema | source_table | source_column |
+-----------+--------------------+---------------+--------------+---------------+
| PK_id     | fiddle_0f9d47226c4 | dbo           | txu          | id            |
| name      | fiddle_0f9d47226c4 | dbo           | txu          | first_name    |
| address   | fiddle_0f9d47226c4 | dbo           | txd          | address       |
| name_full | null               | null          | null         | null          |
+-----------+--------------------+---------------+--------------+---------------+

DBFiddleDemo

这是一种基于查询计划的解决方案。它有一些冒险

  • 几乎可以处理任何 select 查询
  • 无架构绑定

缺点

  • 尚未正确测试
  • 如果 Microsoft 更改 XML 查询计划,可能会突然中断。

核心思想是 XML 查询计划中的每个列表达式都在 "DefinedValue" 节点中定义。 "DefinedValue" 的第一个子节点是对输出列的引用,第二个是表达式。该表达式根据输入列和常量值进行计算。 如前所述,仅基于经验观察,需要适当测试。

调用示例:

exec dbo.GetColumnDependencies 'select * from dbo.vTEST'

target_column_name | source_column_name        | const_value
---------------------------------------------------
address            | Expr1007                  | NULL
name               | Expr1006                  | NULL
Expr1006           | NULL                      | ' '
Expr1006           | [testdb].[dbo].first_name | NULL
Expr1006           | [testdb].[dbo].last_name  | NULL
Expr1007           | NULL                      | ' '
Expr1007           | [testdb].[dbo].number     | NULL
Expr1007           | [testdb].[dbo].street     | NULL

这是代码。 首先得到XML查询计划。

declare @select_query as varchar(4000) = 'select * from dbo.vTEST' -- IT'S YOUR QUERY HERE.
declare @select_into_query    as varchar(4000) = 'select top (1) * into #foo from (' + @select_query + ') as src'
      , @xml_plan             as xml           = null
      , @xml_generation_tries as tinyint       = 10
;
while (@xml_plan is null and @xml_generation_tries > 0) -- There is no guaranty that plan will be cached.
begin 
  execute (@select_into_query);
  select @xml_plan = pln.query_plan
    from sys.dm_exec_query_stats as qry
      cross apply sys.dm_exec_sql_text(qry.sql_handle) as txt
      cross apply sys.dm_exec_query_plan(qry.plan_handle) as pln
    where txt.text = @select_into_query
  ;
end
if (@xml_plan is null
) begin
    raiserror(N'Can''t extract XML query plan from cache.' ,15 ,0);
    return;
  end
;

接下来是主查询。它最大的部分是用于列提取的递归通用 table 表达式。

with xmlnamespaces(default 'http://schemas.microsoft.com/sqlserver/2004/07/showplan'
                  ,'http://schemas.microsoft.com/sqlserver/2004/07/showplan' as shp -- Used in .query() for predictive namespace using. 
)
    , cte_column_dependencies as
    (

递归的种子是一个查询,它为 #foo table 提取存储 1 行感兴趣的 select 查询的列。

select
    (select foo_col.info.query('./ColumnReference') for xml raw('shp:root') ,type) -- Becouse .value() can't extract attribute from root node.
      as target_column_info
  , (select foo_col.info.query('./ScalarOperator/Identifier/ColumnReference') for xml raw('shp:root') ,type)
      as source_column_info
  , cast(null as xml) as const_info
  , 1 as iteration_no
from @xml_plan.nodes('//Update/SetPredicate/ScalarOperator/ScalarExpressionList/ScalarOperator/MultipleAssign/Assign')
        as foo_col(info)
where foo_col.info.exist('./ColumnReference[@Table="[#foo]"]') = 1

递归部分搜索具有依赖列的"DefinedValue"节点,并提取列表达式中使用的所有"ColumnReference"和"Const"子节点。 XML 到 SQL 的转换过于复杂。

union all    
select
    (select internal_col.info.query('.') for xml raw('shp:root') ,type)
  , source_info.column_info
  , source_info.const_info
  , prev_dependencies.iteration_no + 1
from @xml_plan.nodes('//DefinedValue/ColumnReference') as internal_col(info)
  inner join cte_column_dependencies as prev_dependencies -- Filters by depended columns.
        on prev_dependencies.source_column_info.value('(//ColumnReference/@Column)[1]' ,'nvarchar(4000)') = internal_col.info.value('(./@Column)[1]' ,'nvarchar(4000)')
        and exists (select prev_dependencies.source_column_info.value('(.//@Schema)[1]'   ,'nvarchar(4000)') intersect select internal_col.info.value('(./@Schema)[1]'   ,'nvarchar(4000)'))
        and exists (select prev_dependencies.source_column_info.value('(.//@Database)[1]' ,'nvarchar(4000)') intersect select internal_col.info.value('(./@Database)[1]' ,'nvarchar(4000)'))
        and exists (select prev_dependencies.source_column_info.value('(.//@Server)[1]'   ,'nvarchar(4000)') intersect select internal_col.info.value('(./@Server)[1]'   ,'nvarchar(4000)'))
  cross apply ( -- Becouse only column or only constant can be places in result row.
            select (select source_col.info.query('.') for xml raw('shp:root') ,type) as column_info
                 , null                                                              as const_info
              from internal_col.info.nodes('..//ColumnReference') as source_col(info)
            union all
            select null                                                         as column_info
                 , (select const.info.query('.') for xml raw('shp:root') ,type) as const_info
              from internal_col.info.nodes('..//Const') as const(info)
        ) as source_info
where source_info.column_info is null
    or (
        -- Except same node selected by '..//ColumnReference' from its sources. Sorry, I'm not so well to check it with XQuery simple.
            source_info.column_info.value('(//@Column)[1]' ,'nvarchar(4000)') <> internal_col.info.value('(./@Column)[1]' ,'nvarchar(4000)')
        and (select source_info.column_info.value('(//@Schema)[1]'   ,'nvarchar(4000)') intersect select internal_col.info.value('(./@Schema)[1]'   ,'nvarchar(4000)')) is null
        and (select source_info.column_info.value('(//@Database)[1]' ,'nvarchar(4000)') intersect select internal_col.info.value('(./@Database)[1]' ,'nvarchar(4000)')) is null
        and (select source_info.column_info.value('(//@Server)[1]'   ,'nvarchar(4000)') intersect select internal_col.info.value('(./@Server)[1]'   ,'nvarchar(4000)')) is null
      )
)

最后,select 语句将 XML 转换为适当的人类文本。

select
  --  col_dep.target_column_info
  --, col_dep.source_column_info
  --, col_dep.const_info
    coalesce(col_dep.target_column_info.value('(.//shp:ColumnReference/@Server)[1]'   ,'nvarchar(4000)') + '.' ,'')
  + coalesce(col_dep.target_column_info.value('(.//shp:ColumnReference/@Database)[1]' ,'nvarchar(4000)') + '.' ,'')
  + coalesce(col_dep.target_column_info.value('(.//shp:ColumnReference/@Schema)[1]'   ,'nvarchar(4000)') + '.' ,'')
  + col_dep.target_column_info.value('(.//shp:ColumnReference/@Column)[1]' ,'nvarchar(4000)')
    as target_column_name
  , coalesce(col_dep.source_column_info.value('(.//shp:ColumnReference/@Server)[1]'   ,'nvarchar(4000)') + '.' ,'')
  + coalesce(col_dep.source_column_info.value('(.//shp:ColumnReference/@Database)[1]' ,'nvarchar(4000)') + '.' ,'')
  + coalesce(col_dep.source_column_info.value('(.//shp:ColumnReference/@Schema)[1]'   ,'nvarchar(4000)') + '.' ,'')
  + col_dep.source_column_info.value('(.//shp:ColumnReference/@Column)[1]' ,'nvarchar(4000)')
    as source_column_name
  , col_dep.const_info.value('(/shp:root/shp:Const/@ConstValue)[1]' ,'nvarchar(4000)')
    as const_value
from cte_column_dependencies as col_dep
order by col_dep.iteration_no ,target_column_name ,source_column_name
option (maxrecursion 512) -- It's an assurance from infinite loop.

我一直在玩这个,但没有时间继续下去。也许这会有所帮助:

-- Returns all table columns called in the view and the objects they pull from

SELECT
     v.[name] AS ViewName
    ,d.[referencing_id] AS ViewObjectID 
    ,c.[name] AS ColumnNames
    ,OBJECT_NAME(d.referenced_id) AS ReferencedTableName
    ,d.referenced_id AS TableObjectIDsReferenced
FROM 
sys.views v 
INNER JOIN sys.sql_expression_dependencies d ON d.referencing_id = v.[object_id]
INNER JOIN sys.objects o ON d.referencing_id = o.[object_id]
INNER JOIN sys.columns c ON d.referenced_id = c.[object_id]
WHERE v.[name] = 'vTEST'

-- Returns all output columns in the view

SELECT 
     OBJECT_NAME([object_id]) AS ViewName
    ,[object_id] AS ViewObjectID
    ,[name] AS OutputColumnName
FROM sys.columns
WHERE OBJECT_ID('vTEST') = [object_id]

-- Get the view definition

SELECT 
    VIEW_DEFINITION
FROM INFORMATION_SCHEMA.VIEWS
WHERE TABLE_NAME = 'vTEST'

不幸的是,SQL 服务器没有显式存储源 table 列和视图列之间的映射。我怀疑主要原因仅仅是由于视图的潜在复杂性(表达式列、在这些列上调用的函数、嵌套查询等)。

我能想到的确定视图列和源列之间映射的唯一方法是解析与视图关联的查询或解析视图的执行计划。

我在这里概述的方法侧重于第二个选项,并且依赖于 SQL 服务器将避免为查询不需要的列生成输出列表这一事实。

第一步是获取依赖 table 的列表及其视图所需的关联列。这可以通过 SQL 服务器中的标准系统 tables 来实现。

接下来,我们通过游标枚举视图的所有列。

对于每个视图列,我们创建了一个临时包装存储过程,它只从视图中选择有问题的单个列。因为只请求了一个列 SQL 服务器将只检索输出该单个视图列所需的信息。

新创建的过程将运行以纯格式模式查询,因此不会对数据库造成任何实际的I/O操作,但它会在执行时生成一个估计的执行计划。生成查询计划后,我们从执行计划中查询输出列表。因为我们知道选择了哪个视图列,所以我们现在可以将输出列表关联到有问题的视图列。我们可以通过仅关联构成我们原始依赖列表一部分的列来进一步细化关联,这将从结果集中消除表达式输出。

请注意,使用此方法,如果视图需要将不同的 table 连接在一起以生成输出,那么将返回生成输出所需的所有列,即使它没有直接用于列表达式中因为它仍然是直接需要的。

下面的存储过程演示了上面的实现方法:

CREATE PROCEDURE ViewGetColumnDependencies
(
    @viewName   NVARCHAR(50)
)
AS
BEGIN

    CREATE TABLE #_suppress_output
    (
        result NVARCHAR(500) NULL
    );


    DECLARE @viewTableColumnMapping TABLE
    (
        [ViewName]                  NVARCHAR(50),
        [SourceObject]              NVARCHAR(50),
        [SourceObjectColumnName]    NVARCHAR(50),
        [ViewAliasColumn]           NVARCHAR(50)
    )


    -- Get list of dependent tables and their associated columns required for the view.
    INSERT INTO @viewTableColumnMapping
    (
        [ViewName]                  
        ,[SourceObject]             
        ,[SourceObjectColumnName]               
    )
    SELECT          v.[name] AS [ViewName]
                    ,'[' + OBJECT_NAME(d.referenced_major_id) + ']' AS [SourceObject]
                    ,c.[name] AS [SourceObjectColumnName]
    FROM            sys.views v
    LEFT OUTER JOIN sys.sql_dependencies d ON d.object_id = v.object_id
    LEFT OUTER JOIN sys.columns c ON c.object_id = d.referenced_major_id AND c.column_id = d.referenced_minor_id
    WHERE           v.[name] = @viewName;


    DECLARE @aliasColumn NVARCHAR(50);

    -- Next, we enumerate all of the views columns via a cursor. 
    DECLARE ViewColumnNameCursor CURSOR FOR
    SELECT              aliases.name AS [AliasName]
    FROM                sys.views v
    LEFT OUTER JOIN     sys.columns AS aliases  on v.object_id = aliases.object_id -- c.column_id=aliases.column_id AND aliases.object_id = object_id('vTEST')
    WHERE   v.name = @viewName;

    OPEN ViewColumnNameCursor  

    FETCH NEXT FROM ViewColumnNameCursor   
    INTO @aliasColumn  

    DECLARE @tql_create_proc NVARCHAR(MAX);
    DECLARE @queryPlan XML;

    WHILE @@FETCH_STATUS = 0  
    BEGIN 

        /*
        For each view column, we create a temporary wrapper stored procedure that 
        only selects the single column in question from view. The stored procedure 
        will run the query in format only mode and will therefore not cause any 
        actual I/O operations on the database, but it will generate an estimated 
        execution plan when executed.
        */
         SET @tql_create_proc = 'CREATE PROCEDURE ___WrapView
                                AS
                                    SET FMTONLY ON;
                                    SELECT CONVERT(NVARCHAR(MAX), [' + @aliasColumn + ']) FROM [' + @viewName + '];
                                    SET FMTONLY OFF;';

        EXEC (@tql_create_proc);

        -- Execute the procedure to generate a query plan. The insert into the temp table is only done to
        -- suppress the empty result set from being displayed as part of the output.
        INSERT INTO #_suppress_output
        EXEC ___WrapView;

        -- Get the query plan for the wrapper procedure that was just executed.
        SELECT  @queryPlan =   [qp].[query_plan]  
        FROM    [sys].[dm_exec_procedure_stats] AS [ps]
                JOIN [sys].[dm_exec_query_stats] AS [qs] ON [ps].[plan_handle] = [qs].[plan_handle]
                CROSS APPLY [sys].[dm_exec_query_plan]([qs].[plan_handle]) AS [qp]
        WHERE   [ps].[database_id] = DB_ID() AND  OBJECT_NAME([ps].[object_id], [ps].[database_id])  = '___WrapView'

        -- Drop the wrapper view
        DROP PROCEDURE ___WrapView

        /*
        After the query plan is generate, we query the output lists from the execution plan. 
        Since we know which view column was selected we can now associate the output list to 
        view column in question. We can further refine the association by only associating 
        columns that form part of our original dependency list, this will eliminate expression 
        outputs from the result set. 
        */
        ;WITH QueryPlanOutputList AS
        (
          SELECT    T.X.value('local-name(.)', 'NVARCHAR(max)') as Structure,
                    T.X.value('./@Table[1]', 'NVARCHAR(50)') as [SourceTable],
                    T.X.value('./@Column[1]', 'NVARCHAR(50)') as [SourceColumnName],
                    T.X.query('*') as SubNodes

          FROM @queryPlan.nodes('*') as T(X)
          UNION ALL 
          SELECT QueryPlanOutputList.structure + N'/' + T.X.value('local-name(.)', 'nvarchar(max)'),
                 T.X.value('./@Table[1]', 'NVARCHAR(50)') as [SourceTable],
                 T.X.value('./@Column[1]', 'NVARCHAR(50)') as [SourceColumnName],
                 T.X.query('*')
          FROM QueryPlanOutputList
          CROSS APPLY QueryPlanOutputList.SubNodes.nodes('*') as T(X)
        )
        UPDATE @viewTableColumnMapping
        SET     ViewAliasColumn = @aliasColumn
        FROM    @viewTableColumnMapping CM
        INNER JOIN  
                (
                    SELECT DISTINCT  QueryPlanOutputList.Structure
                                    ,QueryPlanOutputList.[SourceTable]
                                    ,QueryPlanOutputList.[SourceColumnName]
                    FROM    QueryPlanOutputList
                    WHERE   QueryPlanOutputList.Structure like '%/OutputList/ColumnReference'
                ) SourceColumns ON CM.[SourceObject] = SourceColumns.[SourceTable] AND CM.SourceObjectColumnName = SourceColumns.SourceColumnName

        FETCH NEXT FROM ViewColumnNameCursor   
        INTO @aliasColumn 
    END

    CLOSE ViewColumnNameCursor;
    DEALLOCATE ViewColumnNameCursor; 

    DROP TABLE #_suppress_output

    SELECT *
    FROM    @viewTableColumnMapping
    ORDER BY [ViewAliasColumn]

END

存储过程现在可以执行如下:

EXEC dbo.ViewGetColumnDependencies @viewName = 'vTEST'