用 antlr 解析 javadoc python

parsing javadoc with antlr python

我正在尝试使用 python3 antlr4 运行时来解析 javadoc。语法已从 here. The Parser and Lexer code has been generated with the help of this 文档中获得。 我如何从代码和解析器、词法分析器中生成树?

import antlr4


from grammars.javadoc.JavadocParser import JavadocParser
from grammars.javadoc.JavadocLexer import JavadocLexer
from grammars.javadoc.JavadocParserListener import JavadocParserListener


class MyJavaDocListener(JavadocParserListener):
  def enterDocumentation(self, ctx:JavadocParser.DocumentationContext):
    print(ctx.getRuleContext().start.line)
    print(ctx.getRuleContext().stop.line)
    print(ctx.getText())


if __name__ == '__main__':

  code = "class X{/**Some random comment*/public int testM(){return 42;}}"
  input_stream = antlr4.InputStream(code)
  lexer = JavadocLexer(input_stream)
  stream = antlr4.CommonTokenStream(lexer)
  parser = JavadocParser(stream)

  tree = parser.what?

  doclistener = MyJavaDocListener()
  walker = antlr4.ParseTreeWalker()
  walker.walk(doclistener, tree)




您使用的语法仅适用于 Javadoc 本身,不适用于包含 Javadocs 的整个 Java 源文件。

你这样使用它:

code = "/**Some random comment*/"
input_stream = antlr4.InputStream(code)
lexer = JavadocLexer(input_stream)
stream = antlr4.CommonTokenStream(lexer)
parser = JavadocParser(stream)

# Call the `documentation` function. All parser rules are mapped to functions.
tree = parser.documentation()

doclistener = MyJavaDocListener()
walker = antlr4.ParseTreeWalker()
walker.walk(doclistener, tree)

如果您想从 Java 源文件中提取它们,您首先需要使用 Java grammar/parser.

解析它们

您可以使用现有的 Java9 grammar 并更改最后的词法分析器规则:

COMMENT
    :   '/*' .*? '*/' -> channel(HIDDEN)
    ;

LINE_COMMENT
    :   '//' ~[\r\n]* -> channel(HIDDEN)
    ;

进入这个:

JAVADOC_COMMENT
    :   '/**' .*? '*/' -> channel(HIDDEN)
    ;

COMMENT
    :   '/*' .*? '*/' -> skip
    ;

LINE_COMMENT
    :   '//' ~[\r\n]* -> skip
    ;

然后创建一个自定义侦听器,在 enterMethodDeclaration 事件发生时进行侦听,然后从令牌流中获取先前的令牌并查看该令牌是否是隐藏的 JAVADOC_COMMENT 令牌。

快速演示:

class JavaDocListener(Java9Listener):

    # methodDeclaration
    #   :   methodModifier* methodHeader methodBody
    #   ;
    def enterMethodDeclaration(self, ctx: Java9Parser.MethodDeclarationContext):
        previous_token_index = ctx.getSourceInterval()[0] - 1
        previous_token = ctx.parser.getTokenStream().tokens[previous_token_index]
        method_name = ctx.methodHeader().methodDeclarator().identifier().getText()
        javadoc = previous_token.text if previous_token.type == Java9Lexer.JAVADOC_COMMENT else None
        print('method: {}, javadoc: {}'.format(method_name, javadoc))


if __name__ == '__main__':

    code = """
        public class X {

          public static String mu() { return null; }

          /**
           * Some random comment
           */
          public int testM() {
            return 42;
          }
        }
    """
    input_stream = antlr4.InputStream(code)
    parser = Java9Parser(antlr4.CommonTokenStream(Java9Lexer(input_stream)))

    tree = parser.ordinaryCompilation()

    doc_listener = JavaDocListener()
    walker = antlr4.ParseTreeWalker()
    walker.walk(doc_listener, tree)

这将打印:

method: mu, javadoc: None
method: testM, javadoc: /**
           * Some random comment
           */