带有 Snowflake SQL 的正则表达式混乱地址

Regex messy addresses with Snowflake SQL

您好,我需要从以下字符串集中提取地址

  1. 143 Evergreen Forest Court(这个没问题)
  2. 326 Hambrick Park Fayetteville, GA 30215
  3. RE: Owner's Policy - 112 Shagbark Ln Mooresville, NC 28115
  4. RE: Owner's Policy - 540 Clearbrook Dr Covington, GA 30016
  5. Closed 9/1/21 4421 Home Stakes Dr Parkton, NC 28371
  6. RP 9/16- 352 Hampton St Elloree, SC 29047
  7. RP: 9/15- 124 Lake Grove Rd Simpsonville, SC 29681
  8. FHA 3/2/22- 6083 Holiday Blvd Forest Park, GA 30297
  9. RD 10/1/21 Roxanne Sellers- 311 Woodbrook Ln Marietta, GA 30068
  10. 4104 Flat Trl- Ricardo Reeder
  11. 6621 Lake Valley DrMemphis, TN 38141

理想输出示例:

之前: 6621 Lake Valley DrMemphis, TN 38141

之后: 6621 Lake Valley Dr

如何使用 Snowflake SQL 完成此操作?我假设 regex_replace 是有序的?有人可以帮我吗?

这适用于列表中的所有示例:

with data as (
    select value --regexp_substr(replace(value, '-', ','), '([0-9]+ [A-Z].*)', 1) value
    from table(split_to_table(
$3 Evergreen Forest Court
326 Hambrick Park Fayetteville, GA 30215
RE: Owner's Policy - 112 Shagbark Ln Mooresville, NC 28115
RE: Owner's Policy - 540 Clearbrook Dr Covington, GA 30016
Closed 9/1/21 4421 Home Stakes Dr Parkton, NC 28371
RP 9/16- 352 Hampton St Elloree, SC 29047
RP: 9/15- 124 Lake Grove Rd Simpsonville, SC 29681
FHA 3/2/22- 6083 Holiday Blvd Forest Park, GA 30297
RD 10/1/21 Roxanne Sellers- 311 Woodbrook Ln Marietta, GA 30068
4104 Flat Trl- Ricardo Reeder$$, '\n')
))

select regexp_substr(value, '[0-9]+ [^0-9]* (Court|Park|Ln|Dr|St|Rd|Trl)')
from data
;

此解决方案基于最后一组有限的可能道路类型。

正如 Simeon 在评论中所说,我希望您找到的示例越多,情况就会变得越复杂。在某些时候,您将不得不从查找示例和要求更复杂的正则表达式转向自行定义规则。