计算多个 CSV 文件的行和列并创建新文件

Question

我在一个目录中有多个大的 逗号分隔 CSV 文件。但是，举个例子：

one.csv 有 3 行，2 列
two.csv 有 4 行 5 列

这是文件的样子 -

# one.csv
  a b 
1 1 3
2 2 2
3 3 1

# two.csv
  c d e f g
1 4 1 1 4 1
2 3 2 2 3 2
3 2 3 3 2 3
4 1 4 4 1 4

目标是创建一个新的 .txt 或 .csv 来给出每个行和列：

one 3 2  
two 4 5

获取单个文件的行和列（并将其转储到文件中）

$ awk -F "," '{print NF}' *.csv | sort | uniq -c > dims.txt

但我不理解获取多个文件计数的语法。

我试过的

$ awk '{for (i=1; i<=2; i++) -F "," '{print NF}' *.csv$i | sort | uniq -c}'

Answer 1

您将需要遍历所有 CSV 打印每个文件的名称和尺寸

for i in *.csv; do awk -F "," 'END{print FILENAME, NR, NF}' $i; done > dims.txt

如果你想避免awk你也可以对行wc -l和字段grep -o "CSV-separator" | wc -l

Answer 2

使用 gnu awk，您可以一次性完成此操作 awk:

awk -F, 'ENDFILE {
   print gensub(/\.[^.]+$/, "", "1", FILENAME), FNR-1, NF-1
}' one.csv two.csv > dims.txt

cat dims.txt

one 3 2
two 4 5

Answer 3

我将按如下方式利用 GNU AWK 的 ENDFILE 完成此任务，令 one.csv 的内容为

1,3
2,2
3,1

和two.csv是

4,1,1,4,1
3,2,2,3,2
2,3,3,2,3
1,4,4,1,4

然后

awk 'BEGIN{FS=","}ENDFILE{print FILENAME, FNR, NF}' one.csv two.csv

输出

one.csv 3 2
two.csv 4 5

说明：ENDFILE是在处理完每个文件后执行的，我将FS设置为,假设字段是,分隔的并且没有[=20] =] inside filed, FILENAME, FNR, NF 是 built-in GNU AWK 变量： FNR 是文件中的当前行数，即在最后一行的 ENDFILE 中，NF 是字段数（也是最后一行）。如果您的文件带有 headers，请使用 FNR-1，如果您的行前面带有行号，请使用 NF-1.

编辑：将 NR 更改为 FNR

Answer 4

如果没有 GNU awk，您可以这样使用 shell 加上 POSIX awk：

for fn in *.csv; do
    cols=$(awk '{print NF; exit}' "$fn")
    rows=$(awk 'END{print NR-1}' "$fn")
    printf "%s %s %s\n" "${fn%.csv}" "$rows" "$cols" 
done

打印：

one 3 2
two 4 5

Answer 5

对于任何 awk，您可以尝试遵循 awk 程序。

awk '
FNR==1{
  if(cols && rows){
    print file,rows,cols
  }
  rows=cols=file=""
  file=FILENAME
  sub(/\..*/,"",file)
  cols=NF
  next
}
{
  rows=(FNR-1)
}
END{
  if(cols && rows){
    print file,rows,cols
  }
}
' one.csv two.csv

说明：为上述解决方案添加详细说明。

awk '                       ##Starting awk program from here.
FNR==1{                     ##Checking condition if this is first line of each line then do following.
  if(cols && rows){         ##Checking if cols AND rows are NOT NULL then do following.
    print file,rows,cols    ##Printing file, rows and cols variables here.
  }
  rows=cols=file=""         ##Nullifying rows, cols and file here.
  file=FILENAME             ##Setting FILENAME value to file here.
  sub(/\..*/,"",file)       ##Removing everything from dot to till end of value in file.
  cols=NF                   ##Setting NF values to cols here.
  next                      ##next will skip all further statements from here.
}
{
  rows=(FNR-1)              ##Setting FNR-1 value to rows here.
}
END{                        ##Starting END block of this program from here.
  if(cols && rows){         ##Checking if cols AND rows are NOT NULL then do following.
    print file,rows,cols    ##Printing file, rows and cols variables here.
  }
}
' one.csv two.csv           ##Mentioning Input_file names here.

计算多个 CSV 文件的行和列并创建新文件

Count rows and columns for multiple CSV files and make new file

awk

我试过的