sparql 如何正确分组这些数据

Question

首先，我没有举一个最小的例子，因为我认为没有它我的问题也能理解。

其次，我没有给你数据，因为我认为没有它我的问题就可以解决。不过，如果你问的话，我愿意给你。

这是我的查询：

select distinct (?x as ?likedItem) (?item as ?suggestedItem) ?similarity ?becauseOf  ((?similarity * ?importance * ?levelImportance) as ?finalSimilarity)

{
  values ?user {bo:ania}
  #the variable ?x is bound to the items the user :ania has liked.
  ?user rs:hasRated ?ratings.
  ?ratings a rs:Likes.
  ?ratings rs:aboutItem ?x.
  ?ratings rs:ratesBy   ?ratingValue.
  #level 0 class similarities
  {
    #extract all the items that are from the same class (type) as the liked items.
    #I assumed the being from the same class accounts for 50% of the similarities.
    #This value can be changed according to the test or the application domain.
    values ?classImportance {0.5} #class level
    bind (?classImportance as ?importance)
    bind( 4/7 as ?levelImportance)
  ?x  a ?class.
  ?class rdfs:subClassOf ?mainClass .
  ?mainClass rdfs:subClassOf rs:RecommendableClass .
  ?mainClass rs:hasSimilarityConfiguration ?similarityConfiguration .
  ?similarityConfiguration rs:hasClassSimilarity ?classSimilarity .
  ?classSimilarity rs:appliedOnClass ?class .
  ?classSimilarity rs:hasClassSimilarityValue ?similarity .
  ?item a ?class.
  bind (concat("it shares the same class, which is ", strafter(str(?class), "#"), ", with ", strafter(str(?x), "#")) as ?becauseOf)
  }
  union
   #level 0 instance similarities
  {
  #extract the items that share the same value for important predicates with the already liked items..
  #I assumed that having the same instance for important predicates account for 100% of the similarities.
  #This value can be changed according to the test or the application domain.
   values ?instanceImportance {1} #instance level
   bind (?instanceImportance as ?importance)
   bind( 4/7 as ?levelImportance)
   ?x  a ?class.
  ?class rdfs:subClassOf ?mainClass .
  ?mainClass rdfs:subClassOf rs:RecommendableClass .
  ?mainClass rs:hasSimilarityConfiguration ?similarityConfiguration .
  ?similarityConfiguration rs:hasPropertySimilarity ?propertySimilarity .
  ?propertySimilarity rs:appliedOnProperty ?property .
  ?propertySimilarity rs:hasPropertySimilarityValue ?similarity .
  ?x ?property ?value .
  ?item ?property ?value .
    bind (concat("it shares ", strafter(str(?value), "#"), " for predicate ", strafter(str(?property), "#"), " with ", strafter(str(?x), "#")) as ?becauseOf)
  }
  filter (?x != ?item)
}

这是结果：

如您所见，结果包含同一个suggestedItem的多个值，我想根据suggestedItem进行分组并对finalSimilarity[=的值求和18=]

我试过这个：

select   ?item (SUM(?similarity * ?importance * ?levelImportance ) as ?finalSimilarity)  (group_concat(distinct ?x) as ?likedItem) (group_concat(?becauseOf ; separator = " ,and ") as ?reason) where
{
  values ?user {bo:ania}
  #the variable ?x is bound to the items the user :ania has liked.
  ?user rs:hasRated ?ratings.
  ?ratings a rs:Likes.
  ?ratings rs:aboutItem ?x.
  ?ratings rs:ratesBy   ?ratingValue.
  #level 0 class similarities
  {
    #extract all the items that are from the same class (type) as the liked items.
    #I assumed the being from the same class accounts for 50% of the similarities.
    #This value can be changed according to the test or the application domain.
    values ?classImportance {0.5} #class level
    bind (?classImportance as ?importance)
    bind( 4/7 as ?levelImportance)
  ?x  a ?class.
  ?class rdfs:subClassOf ?mainClass .
  ?mainClass rdfs:subClassOf rs:RecommendableClass .
  ?mainClass rs:hasSimilarityConfiguration ?similarityConfiguration .
  ?similarityConfiguration rs:hasClassSimilarity ?classSimilarity .
  ?classSimilarity rs:appliedOnClass ?class .
  ?classSimilarity rs:hasClassSimilarityValue ?similarity .
  ?item a ?class.
  bind (concat("it shares the same class, which is ", strafter(str(?class), "#"), ", with ", strafter(str(?x), "#")) as ?becauseOf)
  }
  union
   #level 0 instance similarities
  {
  #extract the items that share the same value for important predicates with the already liked items..
  #I assumed that having the same instance for important predicates account for 100% of the similarities.
  #This value can be changed according to the test or the application domain.
   values ?instanceImportance {1} #instance level
   bind (?instanceImportance as ?importance)
   bind( 4/7 as ?levelImportance)
   ?x  a ?class.
  ?class rdfs:subClassOf ?mainClass .
  ?mainClass rdfs:subClassOf rs:RecommendableClass .
  ?mainClass rs:hasSimilarityConfiguration ?similarityConfiguration .
  ?similarityConfiguration rs:hasPropertySimilarity ?propertySimilarity .
  ?propertySimilarity rs:appliedOnProperty ?property .
  ?propertySimilarity rs:hasPropertySimilarityValue ?similarity .
  ?x ?property ?value .
  ?item ?property ?value .
    bind (concat("it shares ", strafter(str(?value), "#"), " for predicate ", strafter(str(?property), "#"), " with ", strafter(str(?x), "#")) as ?becauseOf)
  }
  filter (?x != ?item)
}
group by ?item
order by desc(?finalSimilarity)

但结果是：

这在我看来是错误的，因为 如果你查看中的 finalSimilarity，该值是 1.7。但是，如果您从第一个查询中手动求和，您会得到 0.62 所以我做错了，

你能帮我找到它吗？

请注意，这两个查询是相同的，只是 select 语句不同

提示

我已经可以用两个 select 解决这个问题了：

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rs: <http://www.SemanticRecommender.com/rs#>
PREFIX bo: <http://www.BookOntology.com/bo#>
PREFIX :<http://www.SemanticBookOntology.com/sbo#>

select ?suggestedItem ( SUM (?finalSimilarity) as ?summedFinalSimilarity)  (group_concat(distinct strafter(str(?likedItem), "#")) as ?becauseYouHaveLikedThisItem) (group_concat(?becauseOf ; separator = " ,and ") as ?reason)
where {
select distinct (?x as ?likedItem) (?item as ?suggestedItem) ?similarity ?becauseOf  ((?similarity * ?importance * ?levelImportance) as ?finalSimilarity)
where
{
  values ?user {bo:ania}
  #the variable ?x is bound to the items the user :ania has liked.
  ?user rs:hasRated ?ratings.
  ?ratings a rs:Likes.
  ?ratings rs:aboutItem ?x.
  ?ratings rs:ratesBy   ?ratingValue.
  #level 0 class similarities
  {
    #extract all the items that are from the same class (type) as the liked items.
    #I assumed the being from the same class accounts for 50% of the similarities.
    #This value can be changed according to the test or the application domain.
    values ?classImportance {0.5} #class level
    bind (?classImportance as ?importance)
    bind( 4/7 as ?levelImportance)
  ?x  a ?class.
  ?class rdfs:subClassOf ?mainClass .
  ?mainClass rdfs:subClassOf rs:RecommendableClass .
  ?mainClass rs:hasSimilarityConfiguration ?similarityConfiguration .
  ?similarityConfiguration rs:hasClassSimilarity ?classSimilarity .
  ?classSimilarity rs:appliedOnClass ?class .
  ?classSimilarity rs:hasClassSimilarityValue ?similarity .
  ?item a ?class.
  bind (concat("it shares the same class, which is ", strafter(str(?class), "#"), ", with ", strafter(str(?x), "#")) as ?becauseOf)
  }
  union
   #level 0 instance similarities
  {
  #extract the items that share the same value for important predicates with the already liked items..
  #I assumed that having the same instance for important predicates account for 100% of the similarities.
  #This value can be changed according to the test or the application domain.
   values ?instanceImportance {1} #instance level
   bind (?instanceImportance as ?importance)
   bind( 4/7 as ?levelImportance)
   ?x  a ?class.
  ?class rdfs:subClassOf ?mainClass .
  ?mainClass rdfs:subClassOf rs:RecommendableClass .
  ?mainClass rs:hasSimilarityConfiguration ?similarityConfiguration .
  ?similarityConfiguration rs:hasPropertySimilarity ?propertySimilarity .
  ?propertySimilarity rs:appliedOnProperty ?property .
  ?propertySimilarity rs:hasPropertySimilarityValue ?similarity .
  ?x ?property ?value .
  ?item ?property ?value .
    bind (concat("it shares ", strafter(str(?value), "#"), " for predicate ", strafter(str(?property), "#"), " with ", strafter(str(?x), "#")) as ?becauseOf)
  }
  filter (?x != ?item)
}
}
group by ?suggestedItem
order by desc(?summedFinalSimilarity)

但对我来说这是一个愚蠢的解决方案，必须有一个更聪明的解决方案，我可以使用一个 select

来获取聚合数据

Answer 1

没有看到你的数据，这是不可能的，而且对于这么大的查询，可能不值得尝试调试确切的问题，但如果你有重复项（这很容易得到，特别是如果你使用某些条件可以匹配两个部分的联合）。例如，假设您有这样的数据：

@prefix : <urn:ex:>

:x :similar [ :sim 0.10 ; :mult 2 ] ,
            [ :sim 0.12 ; :mult 1 ] ,
            [ :sim 0.12 ; :mult 1 ] ,  # yup, a duplicate
            [ :sim 0.15 ; :mult 4 ] .

然后如果你运行这个查询，你会得到四个结果行：

prefix : <urn:ex:>

select ?sim ((?sim * ?mult) as ?final) {
  :x :similar [ :sim ?sim ; :mult ?mult ] .
}

----------------
| sim  | final |
================
| 0.15 | 0.60  |
| 0.12 | 0.12  |
| 0.12 | 0.12  |
| 0.10 | 0.20  |
----------------

但是，如果您 select 不同，您只会看到三个：

select distinct ?sim ((?sim * ?mult) as ?final) {
  :x :similar [ :sim ?sim ; :mult ?mult ] .
}

----------------
| sim  | final |
================
| 0.15 | 0.60  |
| 0.12 | 0.12  |
| 0.10 | 0.20  |
----------------

一旦您开始 group by 和 sum，这些非不同的值都将包括在内：

select (sum(?sim * ?mult) as ?final) {
  :x :similar [ :sim ?sim ; :mult ?mult ] .
}

---------
| final |
=========
| 1.04  |
---------

该总和是所有四项项的总和，而不是三项项的总和。即使数据没有重复值，union也可以引入重复结果：

@prefix : <urn:ex:>

:x :similar [ :sim 0.10 ; :mult 2 ] ,
            [ :sim 0.12 ; :mult 1 ] ,
            [ :sim 0.15 ; :mult 4 ] .

prefix : <urn:ex:>

select (sum(?sim * ?mult) as ?final) {
  { :x :similar [ :sim ?sim ; :mult ?mult ] }
  union
  { :x :similar [ :sim ?sim ; :mult ?mult ] }
}

---------
| final |
=========
| 1.84  |
---------

既然你发现需要使用group_concat(distinct …)，如果有这种性质的重复，我不会感到惊讶。

sparql 如何正确分组这些数据

sparql how to group correctly this data

sparql

请注意，这两个查询是相同的，只是 select 语句不同

提示