Ramdajs，带参数的组数组

Question

要分组的列表：

const arr = [
  {
    "Global Id": "1231",
    "TypeID": "FD1",
    "Size": 160,
    "Flöde": 55,
  },
  {
    "Global Id": "5433",
    "TypeID": "FD1",
    "Size": 160,
    "Flöde": 100,
  },
  {
    "Global Id": "50433",
    "TypeID": "FD1",
    "Size": 120,
    "Flöde": 100,
  },
 {
    "Global Id": "452",
    "TypeID": "FD2",
    "Size": 120,
    "Flöde": 100,
  },
]

指定要分组的键的函数输入：

const columns = [
    {
      "dataField": "TypeID",
      "summarize": false,
    },
    {
      "dataField": "Size",
      "summarize": false,
    },
    {
      "dataField": "Flöde",
      "summarize": true,
    },
]

预期输出：

const output = [
    {
      "TypeID": "FD1",
      "Size": 160,
      "Flöde": 155 // 55 + 100
      "nrOfItems": 2
    },
    {
       "TypeID": "FD1",
       "Size": 120,
       "Flöde": 100,
       "nrOfItems": 1  
    },
    {
       "TypeID": "FD2",
       "Size": 120,
       "Flöde": 100,
       "nrOfItems": 1  
    }
  ]

  // nrOfItems adds up 4. 2 + 1 +1. The totalt nr of items.

函数：

const groupArr = (columns) => R.pipe(...);

"summarize" 属性告诉属性是否应该总结。

数据集非常大，超过 100k 个项目。所以我不想重复不必要的东西。

我看过 R.group 但我不确定它是否适用于此处？

也许是 R.reduce？将组存储在累加器中，汇总值并添加到计数中（如果该组已存在）？需要快速找到群组，以便将群组存储为密钥？

还是在这种情况下使用原版 javascript 更好？

Answer 1

这里先用 vanilla javascipt 回答，因为我对 Ramda 不是很熟悉 API。我很确定这种方法与 Ramda 非常相似。

代码有解释每一步的注释。我将尝试跟进对 Ramda 的重写。

const arr=[{"Global Id":"1231",TypeID:"FD1",Size:160,"Flöde":55},{"Global Id":"5433",TypeID:"FD1",Size:160,"Flöde":100},{"Global Id":"50433",TypeID:"FD1",Size:120,"Flöde":100},{"Global Id":"452",TypeID:"FD2",Size:120,"Flöde":100}],columns=[{dataField:"TypeID",summarize:!1},{dataField:"Size",summarize:!1},{dataField:"Flöde",summarize:!0}];

// The columns that don't summarize
// give us the keys we need to group on
const groupKeys = columns
  .filter(c => c.summarize === false)
  .map(g => g.dataField);

// We compose a hash function that create
// a hash out of all the items' properties
// that are in our groupKeys
const groupHash = groupKeys
  .map(k => x => x[k])
  .reduce(
    (f, g) => x => `${f(x)}___${g(x)}`,
    () => "GROUPKEY"
  );

// The columns that summarize tell us which
// properties to sum for the items within the
// same group
const sumKeys = columns
  .filter(c => c.summarize === true)
  .map(c => c.dataField);
  
// Again, we compose in to a single function.
// This function concats two items, taking the
// "last" item with only applying the sum
// logic for keys in concatKeys
const concats = sumKeys
  .reduce(
    (f, k) => (a, b) => Object.assign(f(a, b), {
      [k]: (a[k] || 0) + b[k]
    }),
    (a, b) => Object.assign({}, a, b)
  )

// Now, we take our data and group by the groupHash
const groups = arr.reduce(
  (groups, x) => {
    const k = groupHash(x);
    if (!groups[k]) groups[k] = [x];
    else groups[k].push(x);
    return groups;
  },
  {}
);

// These are the keys we want our final objects to have...
const allKeys = ["nrTotal"]
  .concat(groupKeys)
  .concat(sumKeys);
  
// ...baked in to a helper to remove other keys
const cleanKeys = obj => Object.assign(
  ...allKeys.map(k => ({ [k]: obj[k] }))
);

// With the items neatly grouped, we can reduce each
// group using the composed concatenator
const items = Object
  .values(groups)
  .flatMap(
    xs => cleanKeys(
      xs.reduce(concats, { nrTotal: xs.length })
    ),
  );

console.log(items);

这是移植到 Ramda 的尝试，但除了用 Ramda 等价物替换 vanilla js 方法外，我没有取得更多进展。很想知道我错过了哪些很酷的实用程序和功能概念！我相信会有更了解 Ramda 细节的人插话！

const arr=[{"Global Id":"1231",TypeID:"FD1",Size:160,"Flöde":55},{"Global Id":"5433",TypeID:"FD1",Size:160,"Flöde":100},{"Global Id":"50433",TypeID:"FD1",Size:120,"Flöde":100},{"Global Id":"452",TypeID:"FD2",Size:120,"Flöde":100}],columns=[{dataField:"TypeID",summarize:!1},{dataField:"Size",summarize:!1},{dataField:"Flöde",summarize:!0}];


const [ sumCols, groupCols ] = R.partition(
  R.prop("summarize"), 
  columns
);

const groupKeys = R.pluck("dataField", groupCols);
const sumKeys = R.pluck("dataField", sumCols);

const grouper = R.reduce(
  (f, g) => x => `${f(x)}___${g(x)}`,
  R.always("GROUPKEY"),
  R.map(R.prop, groupKeys)
);

const reducer = R.reduce(
  (f, k) => (a, b) => R.mergeRight(
    f(a, b),
    { [k]: (a[k] || 0) + b[k] }
  ),
  R.mergeRight,
  sumKeys
);

const allowedKeys = new Set(
  [ "nrTotal" ].concat(sumKeys).concat(groupKeys)
);

const cleanKeys = R.pipe(
  R.toPairs,
  R.filter(([k, v]) => allowedKeys.has(k)),
  R.fromPairs
);

const items = R.flatten(
  R.values(
    R.map(
      xs => cleanKeys(
        R.reduce(
          reducer,
          { nrTotal: xs.length },
          xs
        )
      ),
      R.groupBy(grouper, arr)
    )
  )
);

console.log(items);

<script src="https://cdnjs.cloudflare.com/ajax/libs/ramda/0.26.1/ramda.min.js"></script>

Answer 2

这是我最初的方法。除了 summarize 之外的所有内容都是一个辅助函数，我想如果你真的想要的话，它可以被内联。我发现这种分离更干净。

const getKeys = (val) => pipe (
  filter (propEq ('summarize', val) ),
  pluck ('dataField')
) 

const keyMaker = (columns, keys = getKeys (false) (columns)) => pipe (
  pick (keys),
  JSON .stringify
)

const makeReducer = (
  columns,
  toSum = getKeys (true) (columns),
  toInclude = getKeys (false) (columns),
) => (a, b) => ({
  ...mergeAll (map (k => ({ [k]: b[k] }), toInclude ) ),
  ...mergeAll (map (k => ({ [k]: (a[k] || 0) + b[k] }), toSum ) ),
  nrOfItems: (a .nrOfItems || 0) + 1
})

const summarize = (columns) => pipe (
  groupBy (keyMaker (columns) ),
  values,
  map (reduce (makeReducer (columns), {} ))
)

const arr = [{"Flöde": 55, "Global Id": "1231", "Size": 160, "TypeID": "FD1"}, {"Flöde": 100, "Global Id": "5433", "Size": 160, "TypeID": "FD1"}, {"Flöde": 100, "Global Id": "50433", "Size": 120, "TypeID": "FD1"}, {"Flöde": 100, "Global Id": "452", "Size": 120, "TypeID": "FD2"}]
const columns = [{"dataField": "TypeID", "summarize": false}, {"dataField": "Size", "summarize": false}, {"dataField": "Flöde", "summarize": true}]

console .log (
  summarize (columns) (arr)
)

<script src="https://bundle.run/ramda@0.26.1"></script><script>
const {pipe, filter, propEq, pluck, pick, mergeAll, map, groupBy, values, reduce} = ramda</script>

与 Joe 的解决方案有很多重叠之处，但也有一些真正的区别。看到问题的时候他已经贴出来了，但是我不想影响自己的做法，所以写了上面才去看。请注意我们的哈希函数的不同之处。我的 JSON.stringify 像 {TypeID: "FD1", Size: 160} 这样的值，而 Joe 的创建 "GROUPKEY___FD1___160"。我想我更喜欢我的简单。另一方面，Joe 的解决方案在处理 nrOfItems 方面肯定比我的好。我在每次 reduce 迭代时更新它，并且必须使用 || 0 来处理初始情况。 Joe 只是以 already-known 值开始折叠。但总的来说，解决方案非常相似。

您提到想要减少通过数据的次数。我编写 Ramda 代码的方式往往对此没有帮助。此代码迭代整个列表以将其分组为类似的项目，然后迭代这些组中的每一个以折叠成单独的值。（在 values 中可能还有一个小迭代。）当然可以更改这些以组合这两个迭代。它甚至可以缩短代码。但在我看来，它会变得更难理解。

更新

我对 single-pass 方法很好奇，发现我可以使用我为 multi-pass 构建的所有基础设施，只重写主要功能：

const summarize2 = (columns) => (
  arr,
  makeKey = keyMaker (columns),
  reducer = makeReducer (columns)
) => values (reduce (
  (a, item, key = makeKey (item) ) => assoc (key, reducer (key in a ? a[key]: {}, item), a),
  {},
  arr
))

console .log (
  summarize2 (columns) (arr)
)

除非测试表明这段代码是我的应用程序的瓶颈，否则我不会选择它而不是原来的代码。但它并没有我想象的那么复杂，而且它在一次迭代中完成了所有事情（嗯，除了 values 所做的任何事情。）有趣的是，它让我改变了我对 values 的处理方式的想法。 =17=]。我的助手代码只适用于这个版本，我从来不需要知道组的总大小。如果我使用 Joe 的方法，就不会发生这种情况。

Ramdajs，带参数的组数组

Ramdajs, group array with arguments

ramda.js

更新