如何将 sed 应用于 bash 脚本中的 grep 结果?
How to apply sed to a grep result in a bash script?
我想用一些 sed 命令修改一个 csv 文件,但仅限于匹配特定正则表达式的行。
我有一个在脚本中运行良好的 grep 命令:
#!/usr/bin/bash
egrep '^[A-Z][a-z]*,2018' happiness.csv
以及所需的正确运行的 sed 命令:
#!/usr/bin/bash
sed -re '
s/(^|,)(,|$)/NULL/g; s/(^|,)(,|$)/NULL/g
s/[a-z]/\U&/g
s/([0-9]+\.[0-9]{2})[0-9]+//g
' happiness.csv
当我将它们组合在一个脚本中时,grep 命令被省略,脚本只运行 sed 命令:
#!/usr/bin/bash
egrep '^[A-Z][a-z]*,2018' happiness.csv
sed -re '
s/(^|,)(,|$)/NULL/g; s/(^|,)(,|$)/NULL/g
s/[a-z]/\U&/g
s/([0-9]+\.[0-9]{2})[0-9]+//g
' happiness.csv
有人能帮忙吗?
示例数据:
Country name,Year,Life Ladder,Log GDP per capita,Social support,Healthy life expectancy at birth,Freedom to make life choices,Generosity,Perceptions of corruption,Positive affect,Negative affect,Confidence in national government,Democratic Quality,Delivery Quality,Standard deviation of ladder by country-year,Standard deviation/Mean of ladder by country-year,GINI index (World Bank estimate),"GINI index (World Bank estimate), average 2000-16","gini of household income reported in Gallup, by wp5-year","Most people can be trusted, Gallup","Most people can be trusted, WVS round 1981-1984","Most people can be trusted, WVS round 1989-1993","Most people can be trusted, WVS round 1994-1998","Most people can be trusted, WVS round 1999-2004","Most people can be trusted, WVS round 2005-2009","Most people can be trusted, WVS round 2010-2014"
Afghanistan,2008,3.723589897,7.168690205,0.450662315,50.79999924,0.718114316,0.177888572,0.88168633,0.517637193,0.25819549,0.61207211,-1.929689646,-1.655084372,1.774661899,0.476599723,,,,,,,,,,
Afghanistan,2009,4.401778221,7.333789825,0.55230844,51.20000076,0.678896368,0.200178429,0.850035429,0.583925605,0.23709242,0.611545205,-2.044092655,-1.635024786,1.722687602,0.391361743,,,0.441905767,0.286315262,,,,,,
Afghanistan,2018,4.75838089,7.386628628,0.539075196,51.59999847,0.60012722,0.13435255,0.706766069,0.61826545,0.275323808,0.299357414,-1.991810083,-1.617176056,1.878621817,0.394802749,,,0.327318162,0.275832713,,,,,,
Afghanistan,2011,3.83171916,7.415018559,0.521103561,51.91999817,0.495901406,0.172136664,0.731108546,0.611387312,0.267174691,0.307385713,-1.919018269,-1.616221189,1.78535974,0.465942234,,,0.336764246,,,,,,,
Afghanistan,2012,3.782937527,7.517126083,0.520636737,52.24000168,0.530935049,0.244272724,0.775619805,0.710384727,0.267919123,0.435440153,-1.842995763,-1.40407753,1.798283219,0.47536689,,,0.344539613,,,,,,,
Afghanistan,2013,3.572100401,7.522237778,0.48355186,52.56000137,0.577955365,0.070402659,0.8232041,0.620584846,0.273328096,0.482847273,-1.879708767,-1.403035522,1.223689914,0.342568725,,,0.304368466,,,,,,,
Afghanistan,2014,3.130895615,7.516955376,0.525568426,52.88000107,0.508514047,0.113184482,0.871241987,0.531691492,0.374860734,0.409047514,-1.773256779,-1.312502503,1.395396113,0.445685923,,,0.413973927,,,,,,,
Afghanistan,2015,3.982854605,7.500538826,0.528597236,53.20000076,0.388927579,0.089090675,0.880638301,0.553553164,0.339276046,0.260557145,-1.84436357,-1.29159379,2.16061759,0.542479634,,,0.59691757,,,,,,,
Albania,2018,4.220168591,7.497038364,0.559071779,53,0.522566199,0.051364917,0.793245554,0.564952672,0.348332286,0.324989557,-1.855426311,-1.392712831,1.796219468,0.42562741,,,0.418629497,,,,,,,
期望的输出:
COUNTRY NAME,YEAR,LIFE LADDER,LOG GDP PER CAPITA,SOCIAL SUPPORT,HEALTHY LIFE EXPECTANCY AT BIRTH,FREEDOM TO MAKE LIFE CHOICES,GENEROSITY,PERCEPTIONS OF CORRUPTION,POSITIVE AFFECT,NEGATIVE AFFECT,CONFIDENCE IN NATIONAL GOVERNMENT,DEMOCRATIC QUALITY,DELIVERY QUALITY,STANDARD DEVIATION OF LADDER BY COUNTRY-YEAR,STANDARD DEVIATION/MEAN OF LADDER BY COUNTRY-YEAR,GINI INDEX (WORLD BANK ESTIMATE),"GINI INDEX (WORLD BANK ESTIMATE), AVERAGE 2000-16","GINI OF HOUSEHOLD INCOME REPORTED IN GALLUP, BY WP5-YEAR","MOST PEOPLE CAN BE TRUSTED, GALLUP","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1981-1984","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1989-1993","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1994-1998","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1999-2004","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 2005-2009","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 2010-2014"
AFGHANISTAN,2018,2.69,7.49,-0.50,52.59,0.37,-0.08,0.92,0.42,0.40,0.36,NULL,NULL,1.40,0.52,NULL,NULL,0.29,NULL,NULL,NULL,NULL,NULL,NULL,
ALBANIA,2018,4.63,9.07,-0.82,65.80,0.52,-0.01,0.87,0.55,0.24,0.30,-0.04,-0.42,1.76,0.38,NULL,0.30,NULL,NULL,NULL,NULL,0.24,0.23,NULL, ARGENTINA,2018,5.48,9.16,-0.83,66.19,0.52,-0.16,0.86,0.64,0.27,NULL,0.04,-0.26,1.91,0.34,NULL,0.30,0.61,0.11,NULL,NULL,0.24,0.23,NULL,
您可以使用与 egrep
中相同的正则表达式进行搜索,并确保对所有替换命令进行分组:
sed -nE '1p; /^[A-Z][a-z]*,2018/ {
s/(^|,)(,|$)/NULL/g; s/(^|,)(,|$)/NULL/g
s/[a-z]+/\U&/g
s/([0-9]+\.[0-9]{2})[0-9]+//gp
}' happiness.csv
AFGHANISTAN,2018,4.75,7.38,0.53,51.59,0.60,0.13,0.70,0.61,0.27,0.29,-1.99,-1.61,1.87,0.39,NULL,NULL,0.32,0.27,NULL,NULL,NULL,NULL,NULL,NULL
ALBANIA,2018,4.22,7.49,0.55,53,0.52,0.05,0.79,0.56,0.34,0.32,-1.85,-1.39,1.79,0.42,NULL,NULL,0.41,NULL,NULL,NULL,NULL,NULL,NULL,NULL
我不是 bash 专业人士,但这应该有效:
#!/usr/bin/bash
grep_res=$(egrep '^[Aa]+.*,2018' happiness.csv)
echo "$grep_res" | sed -re '
s/(^|,)(,|$)/NULL/g; s/(^|,)(,|$)/NULL/g
s/[a-z]/\U&/g
s/([0-9]+\.[0-9]{2})[0-9]+//g
'
它所做的是将 grep 的输出保存在 grep_res
变量中,然后将其提供给 sed 命令。
这是与标准 Linux awk
(gawk) 脚本相同的解决方案。
包括处理第一行。
script.awk
{ [=10=] = toupper([=10=]);} #Upper case each incoming line
/^[A-Z]*,2018/ || NR == 1 { # deal with first line or matching with /^[A-Z]*,2018/
[=10=] = gensub(/([,])([,]|$)/, "\1NULL\2", "g", [=10=]); # replace ,, with ,NULL,
[=10=] = gensub(/([,])([,]|$)/, "\1NULL\2", "g", [=10=]); # replace remaining ,, with ,NULL,
[=10=] = gensub(/([0-9]+.[0-9])([0-9])([0-9])*/, "\1\2", "g", [=10=]); # trim decimal point numbers
print [=10=]; # print output line
}
运行
awk -f script.awk happiness.csv
输出
$ awk -f script.awk input.csv
COUNTRY NAME,YEAR,LIFE LADDER,LOG GDP PER CAPITA,SOCIAL SUPPORT,HEALTHY LIFE EXPECTANCY AT BIRTH,FREEDOM TO MAKE LIFE CHOICES,GENEROSITY,PERCEPTIONS OF CORRUPTION,POSITIVE AFFECT,NEGATIVE AFFECT,CONFIDENCE IN NATIONAL GOVERNMENT,DEMOCRATIC QUALITY,DELIVERY QUALITY,STANDARD DEVIATION OF LADDER BY COUNTRY-YEAR,STANDARD DEVIATION/MEAN OF LADDER BY COUNTRY-YEAR,GINI INDEX (WORLD BANK ESTIMATE),"GINI INDEX (WORLD BANK ESTIMATE), AVERAGE 2000-16","GINI OF HOUSEHOLD INCOME REPORTED IN GALLUP, BY WP5-YEAR","MOST PEOPLE CAN BE TRUSTED, GALLUP","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1981-19","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1989-19","MOST PEOPLE CAN BE TRUSTED,
WVS ROUND 1994-19","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1999-20","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 2005-20","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 2010-20"
AFGHANISTAN,2018,4.75,7.38,0.53,51.59,0.60,0.13,0.70,0.61,0.27,0.29,-1.99,-1.61,1.87,0.39,NULL,NULL,0.32,0.27,NULL,NULL,NULL,NULL,NULL,NULL
ALBANIA,2018,4.22,7.49,0.55,53,0.52,0.05,0.79,0.56,0.34,0.32,-1.85,-1.39,1.79,0.42,NULL,NULL,0.41,NULL,NULL,NULL,NULL,NULL,NULL,NULL
我想用一些 sed 命令修改一个 csv 文件,但仅限于匹配特定正则表达式的行。
我有一个在脚本中运行良好的 grep 命令:
#!/usr/bin/bash
egrep '^[A-Z][a-z]*,2018' happiness.csv
以及所需的正确运行的 sed 命令:
#!/usr/bin/bash
sed -re '
s/(^|,)(,|$)/NULL/g; s/(^|,)(,|$)/NULL/g
s/[a-z]/\U&/g
s/([0-9]+\.[0-9]{2})[0-9]+//g
' happiness.csv
当我将它们组合在一个脚本中时,grep 命令被省略,脚本只运行 sed 命令:
#!/usr/bin/bash
egrep '^[A-Z][a-z]*,2018' happiness.csv
sed -re '
s/(^|,)(,|$)/NULL/g; s/(^|,)(,|$)/NULL/g
s/[a-z]/\U&/g
s/([0-9]+\.[0-9]{2})[0-9]+//g
' happiness.csv
有人能帮忙吗?
示例数据:
Country name,Year,Life Ladder,Log GDP per capita,Social support,Healthy life expectancy at birth,Freedom to make life choices,Generosity,Perceptions of corruption,Positive affect,Negative affect,Confidence in national government,Democratic Quality,Delivery Quality,Standard deviation of ladder by country-year,Standard deviation/Mean of ladder by country-year,GINI index (World Bank estimate),"GINI index (World Bank estimate), average 2000-16","gini of household income reported in Gallup, by wp5-year","Most people can be trusted, Gallup","Most people can be trusted, WVS round 1981-1984","Most people can be trusted, WVS round 1989-1993","Most people can be trusted, WVS round 1994-1998","Most people can be trusted, WVS round 1999-2004","Most people can be trusted, WVS round 2005-2009","Most people can be trusted, WVS round 2010-2014"
Afghanistan,2008,3.723589897,7.168690205,0.450662315,50.79999924,0.718114316,0.177888572,0.88168633,0.517637193,0.25819549,0.61207211,-1.929689646,-1.655084372,1.774661899,0.476599723,,,,,,,,,,
Afghanistan,2009,4.401778221,7.333789825,0.55230844,51.20000076,0.678896368,0.200178429,0.850035429,0.583925605,0.23709242,0.611545205,-2.044092655,-1.635024786,1.722687602,0.391361743,,,0.441905767,0.286315262,,,,,,
Afghanistan,2018,4.75838089,7.386628628,0.539075196,51.59999847,0.60012722,0.13435255,0.706766069,0.61826545,0.275323808,0.299357414,-1.991810083,-1.617176056,1.878621817,0.394802749,,,0.327318162,0.275832713,,,,,,
Afghanistan,2011,3.83171916,7.415018559,0.521103561,51.91999817,0.495901406,0.172136664,0.731108546,0.611387312,0.267174691,0.307385713,-1.919018269,-1.616221189,1.78535974,0.465942234,,,0.336764246,,,,,,,
Afghanistan,2012,3.782937527,7.517126083,0.520636737,52.24000168,0.530935049,0.244272724,0.775619805,0.710384727,0.267919123,0.435440153,-1.842995763,-1.40407753,1.798283219,0.47536689,,,0.344539613,,,,,,,
Afghanistan,2013,3.572100401,7.522237778,0.48355186,52.56000137,0.577955365,0.070402659,0.8232041,0.620584846,0.273328096,0.482847273,-1.879708767,-1.403035522,1.223689914,0.342568725,,,0.304368466,,,,,,,
Afghanistan,2014,3.130895615,7.516955376,0.525568426,52.88000107,0.508514047,0.113184482,0.871241987,0.531691492,0.374860734,0.409047514,-1.773256779,-1.312502503,1.395396113,0.445685923,,,0.413973927,,,,,,,
Afghanistan,2015,3.982854605,7.500538826,0.528597236,53.20000076,0.388927579,0.089090675,0.880638301,0.553553164,0.339276046,0.260557145,-1.84436357,-1.29159379,2.16061759,0.542479634,,,0.59691757,,,,,,,
Albania,2018,4.220168591,7.497038364,0.559071779,53,0.522566199,0.051364917,0.793245554,0.564952672,0.348332286,0.324989557,-1.855426311,-1.392712831,1.796219468,0.42562741,,,0.418629497,,,,,,,
期望的输出:
COUNTRY NAME,YEAR,LIFE LADDER,LOG GDP PER CAPITA,SOCIAL SUPPORT,HEALTHY LIFE EXPECTANCY AT BIRTH,FREEDOM TO MAKE LIFE CHOICES,GENEROSITY,PERCEPTIONS OF CORRUPTION,POSITIVE AFFECT,NEGATIVE AFFECT,CONFIDENCE IN NATIONAL GOVERNMENT,DEMOCRATIC QUALITY,DELIVERY QUALITY,STANDARD DEVIATION OF LADDER BY COUNTRY-YEAR,STANDARD DEVIATION/MEAN OF LADDER BY COUNTRY-YEAR,GINI INDEX (WORLD BANK ESTIMATE),"GINI INDEX (WORLD BANK ESTIMATE), AVERAGE 2000-16","GINI OF HOUSEHOLD INCOME REPORTED IN GALLUP, BY WP5-YEAR","MOST PEOPLE CAN BE TRUSTED, GALLUP","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1981-1984","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1989-1993","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1994-1998","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1999-2004","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 2005-2009","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 2010-2014"
AFGHANISTAN,2018,2.69,7.49,-0.50,52.59,0.37,-0.08,0.92,0.42,0.40,0.36,NULL,NULL,1.40,0.52,NULL,NULL,0.29,NULL,NULL,NULL,NULL,NULL,NULL,
ALBANIA,2018,4.63,9.07,-0.82,65.80,0.52,-0.01,0.87,0.55,0.24,0.30,-0.04,-0.42,1.76,0.38,NULL,0.30,NULL,NULL,NULL,NULL,0.24,0.23,NULL, ARGENTINA,2018,5.48,9.16,-0.83,66.19,0.52,-0.16,0.86,0.64,0.27,NULL,0.04,-0.26,1.91,0.34,NULL,0.30,0.61,0.11,NULL,NULL,0.24,0.23,NULL,
您可以使用与 egrep
中相同的正则表达式进行搜索,并确保对所有替换命令进行分组:
sed -nE '1p; /^[A-Z][a-z]*,2018/ {
s/(^|,)(,|$)/NULL/g; s/(^|,)(,|$)/NULL/g
s/[a-z]+/\U&/g
s/([0-9]+\.[0-9]{2})[0-9]+//gp
}' happiness.csv
AFGHANISTAN,2018,4.75,7.38,0.53,51.59,0.60,0.13,0.70,0.61,0.27,0.29,-1.99,-1.61,1.87,0.39,NULL,NULL,0.32,0.27,NULL,NULL,NULL,NULL,NULL,NULL
ALBANIA,2018,4.22,7.49,0.55,53,0.52,0.05,0.79,0.56,0.34,0.32,-1.85,-1.39,1.79,0.42,NULL,NULL,0.41,NULL,NULL,NULL,NULL,NULL,NULL,NULL
我不是 bash 专业人士,但这应该有效:
#!/usr/bin/bash
grep_res=$(egrep '^[Aa]+.*,2018' happiness.csv)
echo "$grep_res" | sed -re '
s/(^|,)(,|$)/NULL/g; s/(^|,)(,|$)/NULL/g
s/[a-z]/\U&/g
s/([0-9]+\.[0-9]{2})[0-9]+//g
'
它所做的是将 grep 的输出保存在 grep_res
变量中,然后将其提供给 sed 命令。
这是与标准 Linux awk
(gawk) 脚本相同的解决方案。
包括处理第一行。
script.awk
{ [=10=] = toupper([=10=]);} #Upper case each incoming line
/^[A-Z]*,2018/ || NR == 1 { # deal with first line or matching with /^[A-Z]*,2018/
[=10=] = gensub(/([,])([,]|$)/, "\1NULL\2", "g", [=10=]); # replace ,, with ,NULL,
[=10=] = gensub(/([,])([,]|$)/, "\1NULL\2", "g", [=10=]); # replace remaining ,, with ,NULL,
[=10=] = gensub(/([0-9]+.[0-9])([0-9])([0-9])*/, "\1\2", "g", [=10=]); # trim decimal point numbers
print [=10=]; # print output line
}
运行
awk -f script.awk happiness.csv
输出
$ awk -f script.awk input.csv
COUNTRY NAME,YEAR,LIFE LADDER,LOG GDP PER CAPITA,SOCIAL SUPPORT,HEALTHY LIFE EXPECTANCY AT BIRTH,FREEDOM TO MAKE LIFE CHOICES,GENEROSITY,PERCEPTIONS OF CORRUPTION,POSITIVE AFFECT,NEGATIVE AFFECT,CONFIDENCE IN NATIONAL GOVERNMENT,DEMOCRATIC QUALITY,DELIVERY QUALITY,STANDARD DEVIATION OF LADDER BY COUNTRY-YEAR,STANDARD DEVIATION/MEAN OF LADDER BY COUNTRY-YEAR,GINI INDEX (WORLD BANK ESTIMATE),"GINI INDEX (WORLD BANK ESTIMATE), AVERAGE 2000-16","GINI OF HOUSEHOLD INCOME REPORTED IN GALLUP, BY WP5-YEAR","MOST PEOPLE CAN BE TRUSTED, GALLUP","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1981-19","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1989-19","MOST PEOPLE CAN BE TRUSTED,
WVS ROUND 1994-19","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1999-20","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 2005-20","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 2010-20"
AFGHANISTAN,2018,4.75,7.38,0.53,51.59,0.60,0.13,0.70,0.61,0.27,0.29,-1.99,-1.61,1.87,0.39,NULL,NULL,0.32,0.27,NULL,NULL,NULL,NULL,NULL,NULL
ALBANIA,2018,4.22,7.49,0.55,53,0.52,0.05,0.79,0.56,0.34,0.32,-1.85,-1.39,1.79,0.42,NULL,NULL,0.41,NULL,NULL,NULL,NULL,NULL,NULL,NULL