我想使用 python 脚本从 java 文件中捕获评论

I want to capture comments from java file using python script

出于文档目的,我想捕获位于其代码上方的每个函数的注释。

我能够将文件迭代到它们的函数名称。一旦我得到函数名称行,我就想捕获它上面的注释。 注释在'/** xxx */'块

/**
* this is the comment
* this is the comment
* this is the comment
*/
@Attribute(type = Attribute.STRING.class)
String RESPONSE_TEXT = "responseText";

/**
* this is the comment
* this is the comment
*/
@Attribute(type = Attribute.LONG.class)
String TIME = "clTimestamp";

这需要工作:

data = open(file_name).read()
data = data.split('/**')
old = data
data = list()
for i in old:
    data.extend(old.split('*/'))
comments = []
for i in range(1, len(data), 2):
    comments.append(data[i])
for k in comments:
    print(k)

现在,当我知道函数名称行以 @Attribute 开头时,使用正则表达式(re 模块)可以很容易地完成它,这可以通过以下方式完成:

import re
content = '''
/**
* this is the comment
* this is the comment
* this is the comment
*/
@Attribute(type = Attribute.STRING.class)
String RESPONSE_TEXT = "responseText";

/**
* this is the comment
* this is the comment
*/
@Attribute(type = Attribute.LONG.class)
String TIME = "clTimestamp";
'''
comments = re.findall(r'(/\*\*.*?\*/)\n(@Attribute[^\n]*)',content,re.DOTALL)

print('Function comments:')
for i in comments:
    print(i[1])
    print(i[0])
    print('\n')

输出:

Function comments
@Attribute(type = Attribute.STRING.class)
/**
* this is the comment
* this is the comment
* this is the comment
*/


@Attribute(type = Attribute.LONG.class)
/**
* this is the comment
* this is the comment
*/

为了清楚起见,我硬编码了 content,我使用 re.findall 和有两组的模式,一组用于注释,第二组用于名称,因此它给出 list of 2-tuples,每个由注释和函数名组成。注意 re.DOTALL 意思是 .*? 可能会给出多行匹配和具有特殊含义的字符转义,即 * as \*.

  x = find_comment(x, "/*", "*/", 2)
  x = find_comment(x, "//", "\n", 0)
  def find_comment(n_array, start_string, end_string, add_index):
     comment_index = n_array.find(start_string)
     if comment_index != -1:
         comment_end_index = n_array.find(end_string, comment_index)
         print(comment_end_index)
         if len(n_array) > comment_end_index:
             print(n_array[comment_index:comment_end_index + add_index])
             n_array = n_array[0: comment_index:] + n_array[comment_end_index + add_index::]
             find_comment(n_array, start_string, end_string, add_index)
             return n_array
      return n_array