在 grep 之后添加文本到每行的开头
Add text to start of each line after grep
我正在尝试提取 github 中所有存储库的名称,并使用此 bash 脚本构建脚本文件以克隆所有存储库:
for i in {1..10}
do
curl -u USERNAME:PASS -s https://api.github.com/user/repos?page=$i | grep -oP '"clone_url": "\K(.*)"' > output$i.txt
done
这是我的脚本在单行中输出每个 repo 名称,但我需要插入 git clone 到每一行的开头所以我写了这个(添加 | xargs -L1 git 克隆),这不起作用:
for i in {1..10}
do
curl -u USERNAME:PASS -s https://api.github.com/user/repos?page=$i | grep -oP '"clone_url": "\K(.*)"' | xargs -L1 git clone > output$i.txt
done
您可以使用 echo
使用 xargs 追加字符串
for i in {1..10}
do
curl -u use_name:pass -s https://api.github.com/user/repos?page=$i | grep -oP '"clone_url": "\K(.*)"' | tr -d '"' | xargs -n 1 echo 'git clone'
done
此外,您可以使用 Perl 来完成此操作。
for i in {1..10}
do
curl -u user_name:pass -s https://api.github.com/user/repos?page=$i | grep -oP '"clone_url": "\K(.*)"' | tr -d '"' | perl -ne 'print "git clone $_"' > output$i.txt
done
使用 jq
始终是解析 JSON 数据的最佳选择:
#!/usr/bin/env bash
for i in {1..10}
do
curl \
--user USERNAME:PASS \
--silent \
"https://api.github.com/user/repos?page=${i}" \
| jq \
--raw-output '.[] | "git clone \(.clone_url)"' \
> "output${i}.txt"
done
或者要处理任意数量的页面,您可以通过向 jq
提供--exit-status
选项。
然后如果 JSON 选择器 return 没有结果( 发生在 return 编辑 GitHub API 的结果时页面为空) jq
return-code 可以测试继续或终止 while 循环:
#!/usr/bin/env bash
typeset -i page=1 # GitHub API paging starts at page 1
while clone_cmds="$(
curl \
--user USERNAME:PASS \
--silent \
"https://api.github.com/user/repos?page=${page}" \
| jq \
--exit-status \
--raw-output \
'.[] | "git clone \(.clone_url)"'
)"; do
# The queried page result length is > 0
# Output to the paged file
# and increase page number
echo >"output$((page++)).txt" "${clone_cmds}"
done
如果您想要与上述相同,但所有存储库都在一个文件中。
以下示例具有 GitHub API 页面处理功能,而不是依赖额外的空请求来标记页面结尾。
它现在还可以处理最多 100 个条目的页面,并在支持的情况下协商压缩传输流。
这是您的存储库克隆列表的特色版本:
#!/usr/bin/env bash
# Set either one to authenticate with the GitHub API.
# GitHub 'Oauth2 token':
OAUTH_TOKEN=''
# GitHub 'username:password':
USER_PASS=''
# The GitHub API Base URL:
typeset -r GITHUB_API='https://api.github.com'
# The array of Curl options to authenticate with GitHub:
typeset -a curl_auth
# Populates the authentication options from what is available.
if [[ -n ${OAUTH_TOKEN} ]]; then
curl_auth=(--header "Authorization: token ${OAUTH_TOKEN}")
elif [[ -n ${USER_PASS} ]]; then
curl_auth=(--user "${USER_PASS}")
else
# These $"string" are bash --dump-po-strings ready.
printf >&2 $"GitHub API need an authentication with either set variable:"$'\n'
printf >&2 "OAUTH_TOKEN='%s'\n" $"GitHub API's Oauth2 token"
printf >&2 $"or"" USER_PASS='%s:%s'.\n" $"username" $"password"
printf >&2 $"See: %s"$'\n' 'https://developer.github.com/v3/#authentication'
exit 1
fi
# Query the GitHub API for user repositories.
# The default results count per page is 30.
# It can be raised up to 100, to limit the number
# of requests needed to retrieve all the results.
# Response headers contains a Link: <url>; rel="next" as
# long as there is a next page.
# See: https://developer.github.com/v3/#pagination
# Compose the API URL for the first page.
next_page_url="${GITHUB_API}/user/repos?per_page=100&page=1"
# While there is a next page URL to query...
while [[ -n ${next_page_url} ]]; do
# Send the API request with curl, and get back a complete
# http_response witch --include response headers, and
# if supported, handle a --compressed data stream,
# keeping stderr &2 --silent.
http_response="$(
curl \
--silent \
--include \
--compressed \
"${curl_auth[@]}" \
"${next_page_url}"
)"
# Get the next page URL from the Link: header.
# Reaching the last page, causes the next_page_url
# variable to be empty.
next_page_url="$(
sed \
--silent \
'/^[[:space:]]*$/,$d;s/Link:.*<\(.*\)>;[[:space:]]*rel="next".*$//p' \
<<<"${http_response}"
)"
# Get the http_body part from the http_response.
http_body="$(sed '1,/^[[:space:]]*$/d' <<<"${http_response}")"
# Query the http_body JSON content with jq.
jq --raw-output '.[] | "git clone \(.clone_url)"' <<<"${http_body}"
done >"output.txt" # Redirect the whole while loop output to the file.
grep
不能替换字符串,但是 sed
可以轻松替换 grep
并且还可以执行替换:
for i in {1..10}
do
curl -u USERNAME:PASS -s "https://api.github.com/user/repos?page=$i" |
sed -n 's/.*"clone_url": "\([^"]*\)".*/git clone ""/p' > "output$i.txt"
done
另请注意 When to wrap quotes around a shell variable? 并在正则表达式中使用 [^"]
来明确说明提取的文本不得包含双引号。
因此,我同意并赞成建议在您输入 JSON.
时使用 jq
的答案
您的第二个脚本有效,您只需要清理 grep 搜索模式,使其不包含不匹配的尾随引号:
grep -oP '"clone_url": \K(.*)\"' | xargs -L1 echo git clone
我正在尝试提取 github 中所有存储库的名称,并使用此 bash 脚本构建脚本文件以克隆所有存储库:
for i in {1..10}
do
curl -u USERNAME:PASS -s https://api.github.com/user/repos?page=$i | grep -oP '"clone_url": "\K(.*)"' > output$i.txt
done
这是我的脚本在单行中输出每个 repo 名称,但我需要插入 git clone 到每一行的开头所以我写了这个(添加 | xargs -L1 git 克隆),这不起作用:
for i in {1..10}
do
curl -u USERNAME:PASS -s https://api.github.com/user/repos?page=$i | grep -oP '"clone_url": "\K(.*)"' | xargs -L1 git clone > output$i.txt
done
您可以使用 echo
使用 xargs 追加字符串for i in {1..10}
do
curl -u use_name:pass -s https://api.github.com/user/repos?page=$i | grep -oP '"clone_url": "\K(.*)"' | tr -d '"' | xargs -n 1 echo 'git clone'
done
此外,您可以使用 Perl 来完成此操作。
for i in {1..10}
do
curl -u user_name:pass -s https://api.github.com/user/repos?page=$i | grep -oP '"clone_url": "\K(.*)"' | tr -d '"' | perl -ne 'print "git clone $_"' > output$i.txt
done
使用 jq
始终是解析 JSON 数据的最佳选择:
#!/usr/bin/env bash
for i in {1..10}
do
curl \
--user USERNAME:PASS \
--silent \
"https://api.github.com/user/repos?page=${i}" \
| jq \
--raw-output '.[] | "git clone \(.clone_url)"' \
> "output${i}.txt"
done
或者要处理任意数量的页面,您可以通过向 jq
提供--exit-status
选项。
然后如果 JSON 选择器 return 没有结果( 发生在 return 编辑 GitHub API 的结果时页面为空) jq
return-code 可以测试继续或终止 while 循环:
#!/usr/bin/env bash
typeset -i page=1 # GitHub API paging starts at page 1
while clone_cmds="$(
curl \
--user USERNAME:PASS \
--silent \
"https://api.github.com/user/repos?page=${page}" \
| jq \
--exit-status \
--raw-output \
'.[] | "git clone \(.clone_url)"'
)"; do
# The queried page result length is > 0
# Output to the paged file
# and increase page number
echo >"output$((page++)).txt" "${clone_cmds}"
done
如果您想要与上述相同,但所有存储库都在一个文件中。
以下示例具有 GitHub API 页面处理功能,而不是依赖额外的空请求来标记页面结尾。
它现在还可以处理最多 100 个条目的页面,并在支持的情况下协商压缩传输流。
这是您的存储库克隆列表的特色版本:
#!/usr/bin/env bash
# Set either one to authenticate with the GitHub API.
# GitHub 'Oauth2 token':
OAUTH_TOKEN=''
# GitHub 'username:password':
USER_PASS=''
# The GitHub API Base URL:
typeset -r GITHUB_API='https://api.github.com'
# The array of Curl options to authenticate with GitHub:
typeset -a curl_auth
# Populates the authentication options from what is available.
if [[ -n ${OAUTH_TOKEN} ]]; then
curl_auth=(--header "Authorization: token ${OAUTH_TOKEN}")
elif [[ -n ${USER_PASS} ]]; then
curl_auth=(--user "${USER_PASS}")
else
# These $"string" are bash --dump-po-strings ready.
printf >&2 $"GitHub API need an authentication with either set variable:"$'\n'
printf >&2 "OAUTH_TOKEN='%s'\n" $"GitHub API's Oauth2 token"
printf >&2 $"or"" USER_PASS='%s:%s'.\n" $"username" $"password"
printf >&2 $"See: %s"$'\n' 'https://developer.github.com/v3/#authentication'
exit 1
fi
# Query the GitHub API for user repositories.
# The default results count per page is 30.
# It can be raised up to 100, to limit the number
# of requests needed to retrieve all the results.
# Response headers contains a Link: <url>; rel="next" as
# long as there is a next page.
# See: https://developer.github.com/v3/#pagination
# Compose the API URL for the first page.
next_page_url="${GITHUB_API}/user/repos?per_page=100&page=1"
# While there is a next page URL to query...
while [[ -n ${next_page_url} ]]; do
# Send the API request with curl, and get back a complete
# http_response witch --include response headers, and
# if supported, handle a --compressed data stream,
# keeping stderr &2 --silent.
http_response="$(
curl \
--silent \
--include \
--compressed \
"${curl_auth[@]}" \
"${next_page_url}"
)"
# Get the next page URL from the Link: header.
# Reaching the last page, causes the next_page_url
# variable to be empty.
next_page_url="$(
sed \
--silent \
'/^[[:space:]]*$/,$d;s/Link:.*<\(.*\)>;[[:space:]]*rel="next".*$//p' \
<<<"${http_response}"
)"
# Get the http_body part from the http_response.
http_body="$(sed '1,/^[[:space:]]*$/d' <<<"${http_response}")"
# Query the http_body JSON content with jq.
jq --raw-output '.[] | "git clone \(.clone_url)"' <<<"${http_body}"
done >"output.txt" # Redirect the whole while loop output to the file.
grep
不能替换字符串,但是 sed
可以轻松替换 grep
并且还可以执行替换:
for i in {1..10}
do
curl -u USERNAME:PASS -s "https://api.github.com/user/repos?page=$i" |
sed -n 's/.*"clone_url": "\([^"]*\)".*/git clone ""/p' > "output$i.txt"
done
另请注意 When to wrap quotes around a shell variable? 并在正则表达式中使用 [^"]
来明确说明提取的文本不得包含双引号。
因此,我同意并赞成建议在您输入 JSON.
时使用jq
的答案
您的第二个脚本有效,您只需要清理 grep 搜索模式,使其不包含不匹配的尾随引号:
grep -oP '"clone_url": \K(.*)\"' | xargs -L1 echo git clone