读取文件时如何忽略/*comment*/的读取内容

How to ignore reading contents of /*comment*/ while reading a file

下面是我的代码:

string ckeywords = File.ReadAllText("E:\ckeywords.csv");
string[] clines = File.ReadAllLines("E:\cprogram\cpro\bubblesort.c");
string letters="";

foreach(string line in clines)
{
    char[] c = line.ToCharArray();
    foreach(char i in c)
    {
        if (i == '/' || i == '"')
        {
            break;
        }
        else 
        {
            letters = letters + i;
        }
    }
}
letters = Regex.Replace(letters, @"[^a-zA-Z ]+", " ");

List<string> listofc = letters.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).ToList();
List<string> listofcsv = ckeywords.Split(new char[] { ',', '\t', '\n', ' ' }, StringSplitOptions.RemoveEmptyEntries).Select(p => p.Trim()).ToList();
List<string> Commonlist = listofcsv.Intersect(listofc).ToList();

在这个if条件下,我可以忽略阅读单行注释的内容和("")之间的内容。

我需要忽略阅读多行评论的内容。我应该使用哪种条件? 假设我的 .c 文件有这行注释,所以对于上面的代码,我不知道如何开始从 /* 迭代到 */ 并忽略它们之间的内容。

/*printf("Sorted list in ascending order:\n");

for ( c = 0 ; c < n ; c++ ) printf("%d\n", array[c]);*/

这是天真地执行以下操作的代码:

  1. 它会删除任何以 /* 开头并以 */ 结尾的多行注释,即使两者之间有换行符。
  2. 它会删除任何以 // 开头并在行尾
  3. 结束的单行注释
  4. 如果注释位于以 " 开头并以 " 结尾的字符串中,不会 删除任何类似上述的注释。

LINQPad代码:

void Main()
{
    var code = File.ReadAllText(@"d:\temp\test.c");
    code.Dump("input");

    bool inString = false;
    bool inSingleLineComment = false;
    bool inMultiLineComment = false;

    var output = new StringBuilder();
    int index = 0;

    while (index < code.Length)
    {
        // First deal with single line comments: // xyz
        if (inSingleLineComment)
        {
            if (code[index] == '\n' || code[index] == '\r')
            {
                inSingleLineComment = false;
                output.Append(code[index]);
                index++;
            }
            else
                index++;

            continue;
        }

        // Then multi-line comments: /* ... */
        if (inMultiLineComment)
        {
            if (code[index] == '*' && index + 1 < code.Length && code[index + 1] == '/')
            {
                inMultiLineComment = false;
                index += 2;
            }
            else
                index++;
            continue;
        }

        // Then deal with strings
        if (inString)
        {
            output.Append(code[index]);
            if (code[index] == '"')
                inString = false;
            index++;
            continue;
        }

        // If we get here we're not in a string or in a comment
        if (code[index] == '"')
        {
            // We found the start of a string
            output.Append(code[index]);
            inString = true;
            index++;
        }
        else if (code[index] == '/' && index + 1 < code.Length && code[index + 1] == '/')
        {
            // We found the start of a single line comment
            inSingleLineComment = true;
            index++;
        }
        else if (code[index] == '/' && index + 1 < code.Length && code[index + 1] == '*')
        {
            // We found the start of a multi line comment
            inMultiLineComment = true;
            index++;
        }
        else
        {
            // Just another character
            output.Append(code[index]);
            index++;
        }
    }

    output.ToString().Dump("output");
}

示例输入:

This should be included // This should not
This should also be included /* while this
should not */ but this should again be included.

Any comments in " /* strings */ " should be included as well.
This goes for "// single line comments" as well.

示例输出(请注意,下面某些行的末尾有一些不可见的空格):

This should be included 
This should also be included  but this should again be included.

Any comments in " /* strings */ " should be included as well.
This goes for "// single line comments" as well.

我成功解决了我的问题现在我可以不用正则表达式以更简单的方式忽略读取/* */ 的内容。 这是我的代码:

string[] clines = File.ReadAllLines("E:\cprogram\cpro\bubblesort.c");
List<string> list = new List<string>();
int startIndexofcomm, endIndexofcomm;

 for (int i = 0; i < clines.Length ; i++ )
    {
       if (clines[i].Contains(@"/*"))
          {
             startIndexofcomm = clines[i].IndexOf(@"/*");
             list.Add(clines[i].Substring(0, startIndexofcomm));

             while(!(clines[i].Contains(@"*/")))
             {
                i++;
             }

             endIndexofcomm = clines[i].IndexOf(@"*/");
             list.Add(clines[i].Substring(endIndexofcomm+2));

             continue;
          }
          list.Add(clines[i]);
     }