Neo4j: Java heap space query exectuion with create unique 语句后出错
Neo4j: Java heap space error after query exectuion with create unique statement
我正在尝试在一些数据量不同的 neo4j 数据库上测试一些查询。如果我测试对少量数据的查询,一切正常并且执行时间很短,但是当我开始对具有 2794 个节点和 94863 个关系的数据库执行查询时,在 Neo4j [=31] 中出现以下错误需要很长时间=]:
Java堆spaceNeo.DatabaseError.General.UnknownFailure
enter image description here
第一次查询:
MATCH (u1:User)-[r1:Rated]->(m:Movie)<-[r2:Rated]-(u2:User)
WITH 1.0*SUM(r1.Rate)/count(r1) as pX,
1.0*SUM(r2.Rate)/count(r2) as pY, u1, u2
MATCH (u1:User)-[r1:Rated]->(m:Movie)<-[r2:Rated]-(u2:User)
WITH SUM((r1.Rate-pX)*(r2.Rate-pY)) as pomProm,
SQRT(SUM((r1.Rate-pX)^2)) as sumX,
SQRT(SUM((r2.Rate-pY)^2)) as sumY, pX,pY,u1,u2
CREATE UNIQUE (u1)-[s:SIMILARITY1]-(u2)
SET s.value = pomProm / (sumX * sumY)
和第二个查询
MATCH (u1:User)-[r1:Rated]->(m:Movie)<-[r2:Rated]-(u2:User)
WITH SUM(r1.Rate * r2.Rate) AS pomProm,
SQRT(REDUCE(r1Pom = 0, i IN COLLECT(r1.Rate) | r1Pom + toInt(i^2))) AS r1V,
SQRT(REDUCE(r2Pom = 0, j IN COLLECT(r2.Rate) | r2Pom + toInt(j^2))) AS r2V,
u1, u2
CREATE UNIQUE (u1)-[s:SIMILARITY2]-(u2)
SET s.value = pomProm / (r1V * r2V)
数据库中的数据由以下 Java 代码生成:
public enum Labels implements Label {
Movie, User
}
public enum RelationshipLabels implements RelationshipType {
Rated
}
public static void main(String[] args) throws IOException, BiffException {
Workbook workbook = Workbook.getWorkbook(new File("C:/Users/User/Desktop/DP/dvdlist.xls"));
Workbook names = Workbook.getWorkbook(new File("C:/Users/User/Desktop/DP/names.xls"));
String path = new String("C:/Users/User/Documents/Neo4j/test7.graphDatabase");
GraphDatabaseFactory dbFactory = new GraphDatabaseFactory();
GraphDatabaseService db = dbFactory.newEmbeddedDatabase(path);
int countMovies = 0;
int numberOfSheets = workbook.getNumberOfSheets();
IndexDefinition indexDefinition;
try (Transaction tx = db.beginTx()) {
Schema schema = db.schema();
indexDefinition = schema.indexFor(DynamicLabel.label(Labels.Movie.toString()))
.on("Name")
.create();
tx.success();
}
try (Transaction tx = db.beginTx()) {
Schema schema = db.schema();
indexDefinition = schema.indexFor(DynamicLabel.label(Labels.Movie.toString()))
.on("Genre")
.create();
tx.success();
}
try (Transaction tx = db.beginTx()) {
Schema schema = db.schema();
indexDefinition = schema.indexFor(DynamicLabel.label(Labels.User.toString()))
.on("Name")
.create();
tx.success();
}
try (Transaction tx = db.beginTx()) {
for (int i = 0; i < numberOfSheets; i++) {
Sheet sheet = workbook.getSheet(i);
int numberOfRows = 6000;//sheet.getRows();
for (int j = 1; j < numberOfRows; j++) {
Cell cell1 = sheet.getCell(0, j);
Cell cell2 = sheet.getCell(9, j);
Node movie = db.createNode(Labels.Movie);
movie.setProperty("Name", cell1.getContents());
movie.setProperty("Genre", cell2.getContents());
countMovies++;
}
}
tx.success();
} catch (Exception e) {
System.out.println("Something goes wrong!");
}
Random random = new Random();
int countUsers = 0;
Sheet sheetNames = names.getSheet(0);
Cell cell;
Node user;
int numberOfUsers = 1500;//sheetNames.getRows();
for (int i = 0; i < numberOfUsers; i++) {
cell = sheetNames.getCell(0, i);
try (Transaction tx = db.beginTx()) {
user = db.createNode(Labels.User);
user.setProperty("Name", cell.getContents());
List<Integer> listForUser = new ArrayList<>();
for (int x = 0; x < 1000; x++) {
int j = random.nextInt(countMovies);
if (!listForUser.isEmpty()) {
if (!listForUser.contains(j)) {
listForUser.add(j);
}
} else {
listForUser.add(j);
}
}
for (int j = 0; j < listForUser.size(); j++) {
Node movies = db.getNodeById(listForUser.get(j));
int rate = 0;
rate = random.nextInt(10) + 1;
Relationship relationship = user.createRelationshipTo(movies, RelationshipLabels.Rated);
relationship.setProperty("Rate", rate);
}
System.out.println("Number of user: " + countUsers);
tx.success();
} catch (Exception e) {
System.out.println("Something goes wrong!");
}
countUsers++;
}
workbook.close();
}
}
有谁知道如何解决这个问题?或者有一些解决方法,如何从对具有大量数据的数据库的查询中获取结果?或者一些查询或设置改进?我真的很感激。
您可能需要配置 Neo4j 可用的内存量。您可以通过编辑 conf/neo4j-wrapper.conf
:
配置 Neo4j 服务器堆大小
wrapper.java.maxmemory=NUMBER_OF_MB_HERE
有关详细信息,请参阅 this page。
但是,查看您的查询(正在执行图形全局所有对操作),您可能需要考虑分批执行它们。例如:
// Find users with overlapping movie ratings
MATCH (u1:User)-[:RATED]->(:Movie)<-[:RATED]-(u2:User)
// only for users whose similarity has not yet been calculated
WHERE NOT exists((u1)-[:SIMILARITY]-(u2))
// consider only up to 50 pairs of users
WITH u1, u2 LIMIT 50
// compute similarity metric and set SIMILARITY relationship with coef
...
然后重复执行此查询,直到计算出具有重叠电影评级的所有用户的相似性度量。
我有一个类似的问题(在版本 4.1 中),可以在 conf/neo4j.conf
或 select 活动数据库 -> 管理 -> 设置中找到属性并增加:
dbms.memory.heap.initial_size
dbms.memory.heap.max_size
有关性能的更多详细信息,请参阅 documentation
我正在尝试在一些数据量不同的 neo4j 数据库上测试一些查询。如果我测试对少量数据的查询,一切正常并且执行时间很短,但是当我开始对具有 2794 个节点和 94863 个关系的数据库执行查询时,在 Neo4j [=31] 中出现以下错误需要很长时间=]: Java堆spaceNeo.DatabaseError.General.UnknownFailure enter image description here 第一次查询:
MATCH (u1:User)-[r1:Rated]->(m:Movie)<-[r2:Rated]-(u2:User)
WITH 1.0*SUM(r1.Rate)/count(r1) as pX,
1.0*SUM(r2.Rate)/count(r2) as pY, u1, u2
MATCH (u1:User)-[r1:Rated]->(m:Movie)<-[r2:Rated]-(u2:User)
WITH SUM((r1.Rate-pX)*(r2.Rate-pY)) as pomProm,
SQRT(SUM((r1.Rate-pX)^2)) as sumX,
SQRT(SUM((r2.Rate-pY)^2)) as sumY, pX,pY,u1,u2
CREATE UNIQUE (u1)-[s:SIMILARITY1]-(u2)
SET s.value = pomProm / (sumX * sumY)
和第二个查询
MATCH (u1:User)-[r1:Rated]->(m:Movie)<-[r2:Rated]-(u2:User)
WITH SUM(r1.Rate * r2.Rate) AS pomProm,
SQRT(REDUCE(r1Pom = 0, i IN COLLECT(r1.Rate) | r1Pom + toInt(i^2))) AS r1V,
SQRT(REDUCE(r2Pom = 0, j IN COLLECT(r2.Rate) | r2Pom + toInt(j^2))) AS r2V,
u1, u2
CREATE UNIQUE (u1)-[s:SIMILARITY2]-(u2)
SET s.value = pomProm / (r1V * r2V)
数据库中的数据由以下 Java 代码生成:
public enum Labels implements Label {
Movie, User
}
public enum RelationshipLabels implements RelationshipType {
Rated
}
public static void main(String[] args) throws IOException, BiffException {
Workbook workbook = Workbook.getWorkbook(new File("C:/Users/User/Desktop/DP/dvdlist.xls"));
Workbook names = Workbook.getWorkbook(new File("C:/Users/User/Desktop/DP/names.xls"));
String path = new String("C:/Users/User/Documents/Neo4j/test7.graphDatabase");
GraphDatabaseFactory dbFactory = new GraphDatabaseFactory();
GraphDatabaseService db = dbFactory.newEmbeddedDatabase(path);
int countMovies = 0;
int numberOfSheets = workbook.getNumberOfSheets();
IndexDefinition indexDefinition;
try (Transaction tx = db.beginTx()) {
Schema schema = db.schema();
indexDefinition = schema.indexFor(DynamicLabel.label(Labels.Movie.toString()))
.on("Name")
.create();
tx.success();
}
try (Transaction tx = db.beginTx()) {
Schema schema = db.schema();
indexDefinition = schema.indexFor(DynamicLabel.label(Labels.Movie.toString()))
.on("Genre")
.create();
tx.success();
}
try (Transaction tx = db.beginTx()) {
Schema schema = db.schema();
indexDefinition = schema.indexFor(DynamicLabel.label(Labels.User.toString()))
.on("Name")
.create();
tx.success();
}
try (Transaction tx = db.beginTx()) {
for (int i = 0; i < numberOfSheets; i++) {
Sheet sheet = workbook.getSheet(i);
int numberOfRows = 6000;//sheet.getRows();
for (int j = 1; j < numberOfRows; j++) {
Cell cell1 = sheet.getCell(0, j);
Cell cell2 = sheet.getCell(9, j);
Node movie = db.createNode(Labels.Movie);
movie.setProperty("Name", cell1.getContents());
movie.setProperty("Genre", cell2.getContents());
countMovies++;
}
}
tx.success();
} catch (Exception e) {
System.out.println("Something goes wrong!");
}
Random random = new Random();
int countUsers = 0;
Sheet sheetNames = names.getSheet(0);
Cell cell;
Node user;
int numberOfUsers = 1500;//sheetNames.getRows();
for (int i = 0; i < numberOfUsers; i++) {
cell = sheetNames.getCell(0, i);
try (Transaction tx = db.beginTx()) {
user = db.createNode(Labels.User);
user.setProperty("Name", cell.getContents());
List<Integer> listForUser = new ArrayList<>();
for (int x = 0; x < 1000; x++) {
int j = random.nextInt(countMovies);
if (!listForUser.isEmpty()) {
if (!listForUser.contains(j)) {
listForUser.add(j);
}
} else {
listForUser.add(j);
}
}
for (int j = 0; j < listForUser.size(); j++) {
Node movies = db.getNodeById(listForUser.get(j));
int rate = 0;
rate = random.nextInt(10) + 1;
Relationship relationship = user.createRelationshipTo(movies, RelationshipLabels.Rated);
relationship.setProperty("Rate", rate);
}
System.out.println("Number of user: " + countUsers);
tx.success();
} catch (Exception e) {
System.out.println("Something goes wrong!");
}
countUsers++;
}
workbook.close();
}
}
有谁知道如何解决这个问题?或者有一些解决方法,如何从对具有大量数据的数据库的查询中获取结果?或者一些查询或设置改进?我真的很感激。
您可能需要配置 Neo4j 可用的内存量。您可以通过编辑 conf/neo4j-wrapper.conf
:
wrapper.java.maxmemory=NUMBER_OF_MB_HERE
有关详细信息,请参阅 this page。
但是,查看您的查询(正在执行图形全局所有对操作),您可能需要考虑分批执行它们。例如:
// Find users with overlapping movie ratings
MATCH (u1:User)-[:RATED]->(:Movie)<-[:RATED]-(u2:User)
// only for users whose similarity has not yet been calculated
WHERE NOT exists((u1)-[:SIMILARITY]-(u2))
// consider only up to 50 pairs of users
WITH u1, u2 LIMIT 50
// compute similarity metric and set SIMILARITY relationship with coef
...
然后重复执行此查询,直到计算出具有重叠电影评级的所有用户的相似性度量。
我有一个类似的问题(在版本 4.1 中),可以在 conf/neo4j.conf
或 select 活动数据库 -> 管理 -> 设置中找到属性并增加:
dbms.memory.heap.initial_size
dbms.memory.heap.max_size
有关性能的更多详细信息,请参阅 documentation