我有一个超过 2000 行的数据框。从 df，我需要在 r 中创建一个模式

Question

格式如下：enter image description here

我需要为 df 创建以下架构。我可能对一件商品有 2 条或更多条说明。

{ 部分：第1节，标题：电力电缆 List_of_Instruction: [ { 说明：提供的信息 }, ] List_of_items:[ { 产品：项目 1， Unit:10, 评分：100 }, { 产品：项目 2， Unit:2, 价格：2000 } ] List_of_instruction:[ { Instruction:information 1.0 }, ] List_of_items:[ { Product:item.1.0, Unit:5, Rate:10 } ], 标题：电缆遏制， List_of_instruction:[ { 说明：资料 } ], List_of_items:[ { Product:item, Unit:100, Rate:4 } ]

} ,

{ 部分： …。标题： …。 }

Answer 1

这里是一个例子，我假设你想在章节、标题和说明下分组，这些列在每个段的顶层，它从你的数据中得出结论。



d <- read.csv(text=
"Description,Unit,Rate,isItem,IsSection,IsInstruction,IsTitle,sheetname
Section1,NA,NA,FALSE,TRUE,FALSE,FALSE,Sheet1
Power Cables,NA,NA,FALSE,FALSE,FALSE,TRUE,Sheet1
Information Provided,NA,NA,FALSE,FALSE,TRUE,FALSE,Sheet1
Item1,10,100,TRUE,FALSE,FALSE,FALSE,Sheet1
Item2,2,2000,TRUE,FALSE,FALSE,FALSE,Sheet1
Information 1.0,NA,NA,FALSE,FALSE,TRUE,FALSE,Sheet1
Item1.0,5,10,TRUE,FALSE,FALSE,FALSE,Sheet1
Cable containment,NA,NA,FALSE,FALSE,FALSE,TRUE,Sheet1
Information,NA,NA,FALSE,FALSE,TRUE,FALSE,Sheet1
item,100,4,TRUE,FALSE,FALSE,FALSE,Sheet1
Section2,NA,NA,NA,TRUE,FALSE,FALSE,Sheet1
Pipe,NA,NA,FALSE,FALSE,FALSE,TRUE,Sheet1
details,NA,NA,FALSE,FALSE,TRUE,FALSE,Sheet1
Items4,7,500,TRUE,FALSE,FALSE,FALSE,Sheet1
",header=TRUE)

library(jsonlite)
library(zoo)

d2 <- d %>% mutate( Section = na.locf( ifelse( IsSection, Description, NA ) ) ) %>%
    group_by( Section ) %>%
    mutate( Title = na.locf( ifelse( IsTitle, Description, NA ), na.rm=FALSE ) ) %>%
    group_by( Section, Title ) %>%
    mutate( Instruction = na.locf( ifelse( IsInstruction, Description, NA ), na.rm=FALSE ) )

## A helper function to do the heavy finesse work:
make.segment <- function(d) {
    with(
        d,
        list(
            Section = Section[1],
            Title = Title[1],
            Instruction = Instruction[1],
            List_of_items =
                d %>% filter( isItem ) %>%
                rename( Product=Description ) %>%
                select( Product, Unit, Rate ) %>%
                apply( 1, as.list )
        )
    )
}

## Standard dplyr work to do the heavy lifting:
l <- d2 %>% filter( !is.na(Title) & !is.na(Instruction) ) %>%
    group_by( Section, Title, Instruction ) %>%
    do( segment = make.segment(.) ) %>%
    pull( segment ) %>%
    as.list

toJSON( l, pretty=TRUE, auto_unbox=TRUE )

你的例子确实显示了 json，所以我假设这就是你想要的，结果是：


[
  {
    "Section": "Section1",
    "Title": "Cable containment",
    "Instruction": "Information",
    "List_of_items": [
      {
        "Product": "item",
        "Unit": "100",
        "Rate": "4"
      }
    ]
  },
  {
    "Section": "Section1",
    "Title": "Power Cables",
    "Instruction": "Information 1.0",
    "List_of_items": [
      {
        "Product": "Item1.0",
        "Unit": "5",
        "Rate": "10"
      }
    ]
  },
  {
    "Section": "Section1",
    "Title": "Power Cables",
    "Instruction": "Information Provided",
    "List_of_items": [
      {
        "Product": "Item1",
        "Unit": "10",
        "Rate": " 100"
      },
      {
        "Product": "Item2",
        "Unit": " 2",
        "Rate": "2000"
      }
    ]
  },
  {
    "Section": "Section2",
    "Title": "Pipe",
    "Instruction": "details",
    "List_of_items": [
      {
        "Product": "Items4",
        "Unit": "7",
        "Rate": "500"
      }
    ]
  }
]

我有一个超过 2000 行的数据框。从 df，我需要在 r 中创建一个模式

i have a dataframe with more than 2000 rows. From the df, i need to create a schema in r

schema

grouping

r

rows

dataframe