在 bigquery 中将罗马数字转换为阿拉伯数字的最佳方法是什么?

What is the best way of converting roman numerals to arabic ones in bigquery?

如题。我想到的最好的事情是使用基本的 REPLACE 函数,但它会失败,除了 IVX 等更复杂的

]

我的数据中的罗马数字位于字符串的末尾,因此基于著名电影速度与激情我的数据如下所示:

movie
Fast & Furious I
Fast & Furious II
Fast & Furious III
Fast & Furious IV
Fast & Furious V
Fast & Furious VI
Fast & Furious VII

The Roman numerals in my data are located at the end of the string

考虑以下方法(以及除您之外的一些虚拟数据)

create temp function deromanize (number STRING) returns STRING 
language js as '''
  var number = number.toUpperCase(),
  validator = /^M*(?:D?C{0,3}|C[MD])(?:L?X{0,3}|X[CL])(?:V?I{0,3}|I[XV])$/,
  token = /[MDLV]|C[MD]?|X[CL]?|I[XV]?/g,
  key = {M:1000,CM:900,D:500,CD:400,C:100,XC:90,L:50,XL:40,X:10,IX:9,V:5,IV:4,I:1},
  num = 0, m;
  if (!(number && validator.test(number))) return false;
  while (m = token.exec(number)) num += key[m[0]];
  return num;
''';
with your_table as (
  select 'Fast & Furious I' movie union all
  select 'Fast & Furious II' union all
  select 'Fast & Furious III' union all
  select 'Fast & Furious IV' union all
  select 'Fast & Furious V' union all
  select 'Fast & Furious VI' union all
  select 'Fast & Furious VII' union all
  select 'Fast & Furious XXXIX' union all
  select 'Fast & Furious LXXI' union all
  select 'Fast & Furious MDCCLXXIV' 
)
select movie, 
  replace(movie, roman_number, deromanize(roman_number)) converted_title
from your_table, 
unnest([struct(array_reverse(split(movie, ' '))[offset(0)] as roman_number)])

有输出