在 Awk 中使用多个数组而不重复代码
Using multiple arrays in Awk without duplication of code
我有工作代码
BEGIN { FS=";"; } # field separator
{
if (match(, /[0-9]+/)) { # matching `ID` value
m=substr(, RSTART, RLENGTH);
a[m]++; # accumulating number of lines for each `ID`
print > m"_count.txt"; # writing lines pertaining to certain `ID` into respective file
}
}
END {
for(i in a) {
print "mv "i"_count.txt "i"_"a[i]".txt" # renaming files with actual counts
}
}
现在我需要改变它来做这样的事情。
所以我有三个 ID 数组,每个数组表示单独的文件夹来保存结果。
BEGIN { FS=";"; } # field separator
{
array1=(125 258 698 874)
array2=(956 887 4455 22)
array3=(111 444 558 966 332)
if ( == ) {varR=} else {varR=}
if (match(varR, /[0-9]+/)) { # matching `ID` value
if ( varR in array1 ) {
FolderName = "folder1/"
m1=substr(varR, RSTART, RLENGTH);
a1[m1]++; # accumulating number of lines for each `ID`
print > (FolderName m1)"_count.txt"; # writing lines pertaining to certain `ID` into respective file
}
if ( varR in array2 ) {
FolderName = "folder2/"
m2=substr(varR, RSTART, RLENGTH);
a2[m2]++; # accumulating number of lines for each `ID`
print > (FolderName m2)"_count.txt"; # writing lines pertaining to certain `ID` into respective file
}
if ( varR in array3 ) {
FolderName = "folder3/"
m3=substr(varR, RSTART, RLENGTH);
a3[m3]++; # accumulating number of lines for each `ID`
print > (FolderName m3)"_count.txt"; # writing lines pertaining to certain `ID` into respective file
}
}
}
END {
for(i in a1) {
print "mv "i"_count.txt "i"_"a1[i]".txt" # renaming files with actual counts
}
for(i in a2) {
print "mv "i"_count.txt "i"_"a2[i]".txt" # renaming files with actual counts
}
for(i in a3) {
print "mv "i"_count.txt "i"_"a3[i]".txt" # renaming files with actual counts
}
}
因为我需要将匹配的 ID 保存到 txt 文件中并放入所需的文件夹中
如果我有 100 个数组怎么办?我需要为每一个复制代码吗?
您是否需要使用不同的数组,或者您可以这样做:
a[1","1] = "abc";
a[1","2] = "xyz";
a[2","2] = "123";
folders[1] = "folder1";
folders[2] = "folder2";
var = "1";
for (f in folders) {
if (var","f in a) {
print a[var","f] " >> " folders[f] "/file_" var;
}
}
使用 GNU Awk 的多维数组支持,这是一个演示您需要的技术的简化解决方案:
$ gawk '
BEGIN { FS=";" } # field separator
{
# Initialize the sub-arrays of the multi-dimensional array.
array[1][""]; split("125;258;698;874", aux); for (i in aux) array[1][aux[i]]
array[2][""]; split("956;887;4455;22", aux); for (i in aux) array[2][aux[i]]
array[3][""]; split("111;444;558;966;332", aux); for (i in aux) array[3][aux[i]]
n = length(array) # The count of sub-arrays
if ( == ) {varR=} else {varR=}
if (match(varR, /[0-9]+/)) { # matching `ID` value
for (i=1;i<=n;++i) { # loop over all arrays
if (varR in array[i]) { # look for the ID among the array keys
print "folder" i
break
}
}
}
}
' <<<'1;1;4455'
folder 2
请参阅我的 this answer,了解此命令中使用的数组初始化和多维数组技术的解释。
请注意,数组初始化将数字存储在数组 array[<n>]
的 键 中,因为这是使用 [= 查找值所需要的12=].
你试过的:
Awk 没有数组初始化语法; array1=(125 258 698 874)
在您的代码中创建的是 单个字符串 : "125258698874"
:
- 周围的
()
在这里没有作用(它们只是为了优先)。
- 在 Awk 中将标记 - 无论是数字还是字符串 - 彼此紧挨着放置执行 字符串连接。
- 也许您误以为 Bash 的数组初始化语法在 Awk 中也有效。
( varR in array1 )
在 array1
的 索引 (键)中寻找 varR
,但是你的数组初始化了按照 Bash 中的方式工作,您必须改为检查 值 。
我有工作代码
BEGIN { FS=";"; } # field separator
{
if (match(, /[0-9]+/)) { # matching `ID` value
m=substr(, RSTART, RLENGTH);
a[m]++; # accumulating number of lines for each `ID`
print > m"_count.txt"; # writing lines pertaining to certain `ID` into respective file
}
}
END {
for(i in a) {
print "mv "i"_count.txt "i"_"a[i]".txt" # renaming files with actual counts
}
}
现在我需要改变它来做这样的事情。 所以我有三个 ID 数组,每个数组表示单独的文件夹来保存结果。
BEGIN { FS=";"; } # field separator
{
array1=(125 258 698 874)
array2=(956 887 4455 22)
array3=(111 444 558 966 332)
if ( == ) {varR=} else {varR=}
if (match(varR, /[0-9]+/)) { # matching `ID` value
if ( varR in array1 ) {
FolderName = "folder1/"
m1=substr(varR, RSTART, RLENGTH);
a1[m1]++; # accumulating number of lines for each `ID`
print > (FolderName m1)"_count.txt"; # writing lines pertaining to certain `ID` into respective file
}
if ( varR in array2 ) {
FolderName = "folder2/"
m2=substr(varR, RSTART, RLENGTH);
a2[m2]++; # accumulating number of lines for each `ID`
print > (FolderName m2)"_count.txt"; # writing lines pertaining to certain `ID` into respective file
}
if ( varR in array3 ) {
FolderName = "folder3/"
m3=substr(varR, RSTART, RLENGTH);
a3[m3]++; # accumulating number of lines for each `ID`
print > (FolderName m3)"_count.txt"; # writing lines pertaining to certain `ID` into respective file
}
}
}
END {
for(i in a1) {
print "mv "i"_count.txt "i"_"a1[i]".txt" # renaming files with actual counts
}
for(i in a2) {
print "mv "i"_count.txt "i"_"a2[i]".txt" # renaming files with actual counts
}
for(i in a3) {
print "mv "i"_count.txt "i"_"a3[i]".txt" # renaming files with actual counts
}
}
因为我需要将匹配的 ID 保存到 txt 文件中并放入所需的文件夹中 如果我有 100 个数组怎么办?我需要为每一个复制代码吗?
您是否需要使用不同的数组,或者您可以这样做:
a[1","1] = "abc";
a[1","2] = "xyz";
a[2","2] = "123";
folders[1] = "folder1";
folders[2] = "folder2";
var = "1";
for (f in folders) {
if (var","f in a) {
print a[var","f] " >> " folders[f] "/file_" var;
}
}
使用 GNU Awk 的多维数组支持,这是一个演示您需要的技术的简化解决方案:
$ gawk '
BEGIN { FS=";" } # field separator
{
# Initialize the sub-arrays of the multi-dimensional array.
array[1][""]; split("125;258;698;874", aux); for (i in aux) array[1][aux[i]]
array[2][""]; split("956;887;4455;22", aux); for (i in aux) array[2][aux[i]]
array[3][""]; split("111;444;558;966;332", aux); for (i in aux) array[3][aux[i]]
n = length(array) # The count of sub-arrays
if ( == ) {varR=} else {varR=}
if (match(varR, /[0-9]+/)) { # matching `ID` value
for (i=1;i<=n;++i) { # loop over all arrays
if (varR in array[i]) { # look for the ID among the array keys
print "folder" i
break
}
}
}
}
' <<<'1;1;4455'
folder 2
请参阅我的 this answer,了解此命令中使用的数组初始化和多维数组技术的解释。
请注意,数组初始化将数字存储在数组
array[<n>]
的 键 中,因为这是使用 [= 查找值所需要的12=].
你试过的:
Awk 没有数组初始化语法;
array1=(125 258 698 874)
在您的代码中创建的是 单个字符串 :"125258698874"
:- 周围的
()
在这里没有作用(它们只是为了优先)。 - 在 Awk 中将标记 - 无论是数字还是字符串 - 彼此紧挨着放置执行 字符串连接。
- 也许您误以为 Bash 的数组初始化语法在 Awk 中也有效。
- 周围的
( varR in array1 )
在array1
的 索引 (键)中寻找varR
,但是你的数组初始化了按照 Bash 中的方式工作,您必须改为检查 值 。