jq 为 json 对象中的字符串添加前缀

jq to prefix a string in a json object with a string

我希望使用 aws s3api --list-objectsjq 的组合为 Redshift COPY 生成清单文件,如下所示:-

aws s3api list-objects --bucket annalects3 --prefix "DFA/20160926/394007-OMD-Coles/dcm_account394007_impression" --output json --query '{"entries": Contents[].{"url":"Key"}}' | jq '.entries[].mandatory = true'

生成如下输出:-

    {   "entries": [
        {
          "mandatory": true,
          "url": "DFA/20160926/394007-OMD-Coles/dcm_account394007_impression_2016092507_20160926_002328_292527438.csv.gz"
        },
        {
          "mandatory": true,
          "url": "DFA/20160926/394007-OMD-Coles/dcm_account394007_impression_2016092508_20160926_020131_292592736.csv.gz"
        },
        {
          "mandatory": true,
          "url": "DFA/20160926/394007-OMD-Coles/dcm_account394007_impression_2016092509_20160926_030312_292502379.csv.gz"
        },
        {
          "mandatory": true,
          "url": "DFA/20160926/394007-OMD-Coles/dcm_account394007_impression_2016092510_20160926_033656_292590227.csv.gz"
        }   
  ] 
}

然而,清单文件需要 URL 对象,并以我没有使用过的存储桶名称为前缀。输出需要看起来像

{   "entries": [
        {
          "mandatory": true,
          "url": "s3://mybucket/DFA/20160926/394007-OMD-Coles/dcm_account394007_impression_2016092507_20160926_002328_292527438.csv.gz"
        },
        {
          "mandatory": true,
          "url": "s3://mybucket/DFA/20160926/394007-OMD-Coles/dcm_account394007_impression_2016092508_20160926_020131_292592736.csv.gz"
        },
        {
          "mandatory": true,
          "url": "s3://mybucket/DFA/20160926/394007-OMD-Coles/dcm_account394007_impression_2016092509_20160926_030312_292502379.csv.gz"
        },
        {
          "mandatory": true,
          "url": "s3://mybucket/DFA/20160926/394007-OMD-Coles/dcm_account394007_impression_2016092510_20160926_033656_292590227.csv.gz"
        }   
  ] 
}

以下将实现你想要的

aws s3api list-objects \
    --bucket <mybucket> \
    --prefix "<myprefix>" \
    --output json \
    --query '{"entries": Contents[].{"url":"Key"}}' \
| jq '.entries[] | .url = "s3://<mybucket>/\(.entries.url)" | .mandatory = true'

我正在使用 String interpolation 来更新 entries[].url

下面的这个对我来说非常好。上面建议没给逗号分隔的字典

aws s3api list-objects \
--bucket "xxxxx-xxxxxxx-xxx" \
--output json \
--query "{"entries": Contents[?LastModified>='YYYY-MM-DD' && contains(Key,'somestring') ].{"url":"Key"}}" \
| jq '[.entries[] | .url = "s3://xxxxx-xxxxxx-xxx/\(.url)" | .mandatory = true] | { entries: .}' > test_jq1.json

I am not able to add content_length to the above solution as below with "meta" as dictionary.  Any help is greatly appreciated 

{
  "entries": [
    {
      "url":"s3://my-bucket/file1.parquet",
      "mandatory":true,
      "meta":{
        "content_length":2893394
      }
    },
    {
      "url":"s3://my-bucket/file2.parquet",
      "mandatory":true,
      "meta":{
        "content_length":2883626
      }
    }
  ]
}