从 class 文件中提取函数

Question

我正在使用正则表达式从 Javascript 中的 AS3 class 文件中提取函数。但是我对其中有内部括号的函数有疑问（比如内部 if 或其他）。

我看到我可以使用 (?<body>{(?:[^{}]+|(?-1))*+}) 递归地执行此操作，但这不适用于 Javascript，因为不支持递归 (?-1)。

我想知道是否有人可以帮我想出另一种解决方案。这是我的 regex101 测试 = https://regex101.com/r/eE6mX3/1

\((?:[^)(]+|\((?:[^)(]+|\([^)(]*\))*\))*\)

感谢所有能帮助我的人。

Answer 1

警告！您继续阅读需要您自担风险！前面的正则表达式真的很丑！！！

（这个是我最初开始的答案，不应该当真。但是背后有很多工作，所以我还是发布了。我的另一个答案应该是你想要的。）

现在...这在一定程度上是可能的——如果嵌套的层数有限，就可以做到。

（我应该指出，我没有使用 AS3 的经验，并且无法将您在 regex101 中的示例与我在 google 上搜索到的内容完全联系起来。）

假设它是普通 JS 函数的主体，您不想获取此解决方案应该（可以）为您完成。

嵌套

使用 {[^{}]*} 可以轻松匹配内部范围。这匹配两个周围的大括号和它们之间的任何东西（[^{}]* 是一个否定字符 class 匹配任何字符但 { 和 } - * 表示 任意次数 ）。现场示例：

var re = /{[^{}]*}/,
    str = 'one { two { three { four { five } six } seven } eight } nine';
    
    document.write(str.match( re ));

现在，如果我们需要匹配嵌套级别，只需将级别添加到正则表达式即可轻松完成，例如 {[^{}]*{[^{}]*}[^{}]*}。现场示例：

var re = /{[^{}]*{[^{}]*}[^{}]*}/,
    str = 'one { two { three { four { five } six } seven } eight } nine';
    
    document.write(str.match( re ));

甚至四级 - 没问题：{[^{}]*{[^{}]*{[^{}]*{[^{}]*}[^{}]*}[^{}]*}[^{}]*}。现场示例：

var re = /{[^{}]*{[^{}]*{[^{}]*{[^{}]*}[^{}]*}[^{}]*}[^{}]*}/,
    str = 'one { two { three { four { five } six } seven } eight } nine';
    
    document.write(str.match( re ));

当然，你可以走得更远，但开始(?)看起来很恶心。

字符串和注释

现在您可以问自己 - "What about strings and comments?"（如 Wiktor 所述）。好吧，有一个解决方案。

如果我们使用字符串 {one /* two } */ three ' { four }' "} five {" } // {six}（表示带有注释的 JS 代码行），我们可以轻松地自行匹配单独的部分。

'(?:\'|[^'])*' 匹配单引号字符串（允许转义单引号）。
"(?:\"|[^"])*" 匹配双引号字符串（允许转义双引号）。
\/\*(?:[^*]*)*?\*\/ 匹配 /* abc */ 风格的注释（可能是多行）。
\/\/.* 匹配 // abc（行尾）样式注释

将这些加入交替匹配所有这些 - 此处说明：

var re = /'(?:\'|[^'])*'|"(?:\"|[^"])*"|\/\*(?:[^*]*)*?\*\/|\/\/.*/g,
    str = '{one /* two } */ three \'{ four }\' "} five {" } // {six}';
    
    document.write(str.match( re ).join('<br/>'));

首先是注释 /* two } */，然后是字符串 ' { four }' 和 "} five {"，最后是 // {six} 评论。

综合起来

从匹配 string/comment 部分开始，我们可以避免在搜索匹配对时包含 不需要的 大括号。我们还将包含一个字符 class，它由所有合法 Javascript 字符（white-space、单词字符、运算符、标点符号等组成) 交替。

那么我们需要得到内部作用域：

{(?:(?:'(?:\'|[^'])*?'|"(?:\"|[^"])*?"|\/\*(?:[^*]*)*?\*\/|\/\/.*)|[\s\w-+\/*.=,();|&[\]:])*}

实例：

var re = /{(?:(?:'(?:\'|[^'])*?'|"(?:\"|[^"])*?"|\/\*(?:[^*]*)*?\*\/|\/\/.*)|[\s\w-+\/*.=,();|&[\]:])*}/,
    code = '// Example code\nfunction(a,b,c) {\n\t// Return sum of all arguments\n\treturn a+b+c;\n\t// Ignore this: }\n}\n';
    
document.write('<span style="background-color:lightblue;">ORIGINAL CODE:<br/></span>');
document.write(code.replace(/\n/g, '<br/>'));

document.write('<span style="background-color:lightblue;">INNER SCOPE:<br/></span>');
document.write(code.match(re)[0].replace(/\n/g, '<br/>'));

将其与上面解释的嵌套逻辑相结合，我们将拥有（真正复杂的）正则表达式，它应该可以解决问题，但具有前面所述的限制。（多级嵌套可能会导致超时）。

(?:{(?:(?:'(?:\'|[^'])*?'|"(?:\"|[^"])*?"|\/\*(?:[^*]*)*?\*\/|\/\/.*)|[\s\w-+\/*.=,();|&[\]:])*(?:{(?:(?:'(?:\'|[^'])*?'|"(?:\"|[^"])*?"|\/\*(?:[^*]*)*?\*\/|\/\/.*)|[\s\w-+\/*.=,();|&[\]:])*(?:{(?:(?:'(?:\'|[^'])*?'|"(?:\"|[^"])*?"|\/\*(?:[^*]*)*?\*\/|\/\/.*)|[\s\w-+\/*.=,();|&[\]:])*(?:{(?:(?:'(?:\'|[^'])*?'|"(?:\"|[^"])*?"|\/\*(?:[^*]*)*?\*\/|\/\/.*)|[\s\w-+\/*.=,();|&[\]:])*(?:{(?:(?:'(?:\'|[^'])*?'|"(?:\"|[^"])*?"|\/\*([^*]*)*?\*\/|\/\/.*)|[\s\w-+\/*.=,();|&[\]:])*})?(?:(?:'(?:\'|[^'])*?'|"(?:\"|[^"])*?"|\/\*([^*]*)*?\*\/|\/\/.*)|[\s\w-+\/*.=,();|&[\]:])*})?(?:(?:'(?:\'|[^'])*?'|"(?:\"|[^"])*?"|\/\*([^*]*)*?\*\/|\/\/.*)|[\s\w-+\/*.=,();|&[\]:])*})?(?:(?:'(?:\'|[^'])*?'|"(?:\"|[^"])*?"|\/\*(?:[^*]*)*?\*\/|\/\/.*)|[\s\w-+\/*.=,();|&[\]:])*})?(?:(?:'(?:\'|[^'])*?'|"(?:\"|[^"])*?"|\/\*(?:[^*]*)*?\*\/|\/\/.*)|[\s\w-+\/*.=,();|&[\]:])*})?

(我的天啊！！！我创造了一个怪物！）

var re = /{(?:(?:'(?:\'|[^'])*?'|"(?:\"|[^"])*?"|\/\*(?:[^*]*)*?\*\/|\/\/.*)|[\s\w-+\/*.=,();|&[\]:])*(?:{(?:(?:'(?:\'|[^'])*?'|"(?:\"|[^"])*?"|\/\*(?:[^*]*)*?\*\/|\/\/.*)|[\s\w-+\/*.=,();|&[\]:])*(?:{(?:(?:'(?:\'|[^'])*?'|"(?:\"|[^"])*?"|\/\*(?:[^*]*)*?\*\/|\/\/.*)|[\s\w-+\/*.=,();|&[\]:])*(?:{(?:(?:'(?:\'|[^'])*?'|"(?:\"|[^"])*?"|\/\*(?:[^*]*)*?\*\/|\/\/.*)|[\s\w-+\/*.=,();|&[\]:])*(?:{(?:(?:'(?:\'|[^'])*?'|"(?:\"|[^"])*?"|\/\*(?:[^*]*)*?\*\/|\/\/.*)|[\s\w-+\/*.=,();|&[\]:])*})?(?:(?:'(?:\'|[^'])*?'|"(?:\"|[^"])*?"|\/\*(?:[^*]*)*?\*\/|\/\/.*)|[\s\w-+\/*.=,();|&[\]:])*})?(?:(?:'(?:\'|[^'])*?'|"(?:\"|[^"])*?"|\/\*(?:[^*]*)*?\*\/|\/\/.*)|[\s\w-+\/*.=,();|&[\]:])*})?(?:(?:'(?:\'|[^'])*?'|"(?:\"|[^"])*?"|\/\*(?:[^*]*)*?\*\/|\/\/.*)|[\s\w-+\/*.=,();|&[\]:])*})?(?:(?:'(?:\'|[^'])*?'|"(?:\"|[^"])*?"|\/\*(?:[^*]*)*?\*\/|\/\/.*)|[\s\w-+\/*.=,();|&[\]:])*}/g,
    unnecessary_complex_code = "var str='123';\n\nfunction /* 123 * 4 * / */ test(\n// qwe\nparam // qwe\n, rapam\n) // \n{\n\ndocument /* qwe*/ . // write\nwrite ( /*\ntest\n*/ test(123) // );\n);\n\n if(param===rapam) {\n  if( false ) {\n   // Another nested level\n   var another_level = {\n    test: { result: true  }\n   };\n  }\n  return // 'FALSE';\n   'TRUE;\n } else return /* 'TRUE';\n */ 'FALSE'\n;\n  /*\n}\n*/\n}\n";
    
    console.log('Code used:');
    console.log(unnecessary_complex_code);
    console.log('Result:');
    console.log(unnecessary_complex_code.match(re));

Answer 2

这可能比您想象的要简单，并且不需要任何正则表达式或解析；)

在解释之前先展示一个例子更容易：

// This is an example of how a function may look

function Demo(first, second, third) {
  // Concatenate the first and second if third is undefined
  if(!third) { // This is to illustrate that a { doesn't cause any problem
    return first + ' - ' + second;
  }
  if(third==='first') { /*
    No problem with } this either
    */ return first;
  }
  if(third==='second') {
    return second;
  }
  else {
    if(third==='first and second') {
      return first + ' and ' + second;
    }
  }
}

// Now show what the function returns when invoked
console.log('The function returns:\n' + Demo('one', 'two'));

// And now - the magic: show the function definition
console.log('The function definition is:\n' + Demo);

span.result {
  background-color: lightblue;
}
span.definition {
  background-color: wheat;
}

在没有 () 的情况下访问函数将 return 函数定义而不是函数的结果。

从 class 文件中提取函数

Extract functions from class file

javascript

regex

recursion

actionscript-3