从单元格中提取文本部分

Extract Text Sections from a Cell

给定一个单元格,其文本来自 HTML,格式如下:

OA-1

Interpret products of whole numbers, e.g., interpret 5 × 7 as the total number of objects in 5 groups of 7 objects each. For example, describe a context in which a total number...

  • More

OA-2

Interpret whole-number quotients of whole numbers, e.g., interpret 56 ÷ 8 as the number of objects in each share when 56 objects are partitioned equally into 8 shares, or as a number ...

目标:提取 header 标识符的列表,以便输出如下所示: OA-1,OA-2...

我已通过 =importhtml 函数提取数据,如本 MWE sheet 的两个示例所示。

注意到 char(10) 是一个 return 字符,我正在考虑这样的代码 伪代码:

Left(Cell_with_text,number_of_characters = find(first char(10))-1)&","&"find_next_heade"+\r

另一种方法可能是创建一个包含所有 header 的库(例如,“OA-1,OA-2...”),并以某种方式在单元格中找到每个实例,可能带有在数组中查找函数?

假设

这个公式一次拆分所有这些,然后只保留第一列(这是你想要的输出)。然后执行 JOIN().

=JOIN(", ",INDEX(SPLIT(importhtml("https://contentexplorer.smarterbalanced.org/target/m-g3-c1-ta","list",3),CHAR(10)),,1))

Here is a sample sheet, viewable to all in perpetuity.