无法使用 Python Sedona GeoJsonReader 打开 GeoJson

Can't open GeoJson with Python Sedona GeoJsonReader

我正在使用 Python 的 Apache Sedona 打开 GeoJson 文件。我关注了this guide。我遵循打开 GeoJson 的每个步骤,但为了清楚起见,这就是我所做的:

spark = SparkSession.\
    builder.\
    master("local[*]").\
    appName("Sedona App").\
    config("spark.serializer", KryoSerializer.getName).\
    config("spark.kryo.registrator", SedonaKryoRegistrator.getName) .\
    config("spark.jars.packages", "org.apache.sedona:sedona-python-adapter-3.0_2.12:1.1.0-incubating,org.datasyslab:geotools-wrapper:1.1.0-25.2") .\
    getOrCreate()
SedonaRegistrator.registerAll(spark)
sc = spark.sparkContext
amenity_file = 'example/2/amenity.geojson'
geojson_file = GeoJsonReader.readToGeometryRDD(sc, amenity_file)

最后一行吐出这个:

22/03/25 16:52:17 WARN FormatMapper: [Sedona] The GeoJSON file doesn't have feature properties

但是,我继续使用以下行(就像在示例中一样):

Adapter.toDf(geojson_file, spark).show()

但是我得到一个错误:

22/03/25 16:52:26 ERROR FormatMapper: [Sedona] com.fasterxml.jackson.core.io.JsonEOFException: Unexpected end-of-input: expected close marker for Object (start marker at [Source: (String)"{"; line: 1, column: 1])
 at [Source: (String)"{"; line: 1, column: 3]
22/03/25 16:52:26 ERROR Executor: Exception in task 0.0 in stage 8.0 (TID 8)
java.lang.RuntimeException: com.fasterxml.jackson.core.io.JsonEOFException: Unexpected end-of-input: expected close marker for Object (start marker at [Source: (String)"{"; line: 1, column: 1])
 at [Source: (String)"{"; line: 1, column: 3]
    at org.wololo.geojson.GeoJSONFactory.create(GeoJSONFactory.java:31)
    at org.wololo.jts2geojson.GeoJSONReader.read(GeoJSONReader.java:20)
    at org.wololo.jts2geojson.GeoJSONReader.read(GeoJSONReader.java:16)
    at org.apache.sedona.core.formatMapper.FormatMapper.readGeoJSON(FormatMapper.java:206)
    at org.apache.sedona.core.formatMapper.FormatMapper.readGeometry(FormatMapper.java:304)
    at org.apache.sedona.core.formatMapper.FormatMapper.call(FormatMapper.java:377)
    at org.apache.sedona.core.formatMapper.FormatMapper.call(FormatMapper.java:52)
    at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitions(JavaRDDLike.scala:153)
    at org.apache.spark.rdd.RDD.$anonfun$mapPartitions(RDD.scala:837)
    at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$adapted(RDD.scala:837)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:127)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run(Executor.scala:462)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:465)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: com.fasterxml.jackson.core.io.JsonEOFException: Unexpected end-of-input: expected close marker for Object (start marker at [Source: (String)"{"; line: 1, column: 1])
 at [Source: (String)"{"; line: 1, column: 3]
    at com.fasterxml.jackson.core.base.ParserMinimalBase._reportInvalidEOF(ParserMinimalBase.java:664)
    at com.fasterxml.jackson.core.base.ParserBase._handleEOF(ParserBase.java:486)
    at com.fasterxml.jackson.core.base.ParserBase._eofAsNextChar(ParserBase.java:498)
    at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._skipWSOrEnd(ReaderBasedJsonParser.java:2354)
    at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextFieldName(ReaderBasedJsonParser.java:905)
    at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:249)
    at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:68)
    at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:15)
    at com.fasterxml.jackson.databind.ObjectMapper._readTreeAndClose(ObjectMapper.java:4254)
    at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2711)
    at org.wololo.geojson.GeoJSONFactory.create(GeoJSONFactory.java:21)
    ... 32 more
22/03/25 16:52:26 WARN TaskSetManager: Lost task 0.0 in stage 8.0 (TID 8, works-mbp.lan, executor driver): java.lang.RuntimeException: com.fasterxml.jackson.core.io.JsonEOFException: Unexpected end-of-input: expected close marker for Object (start marker at [Source: (String)"{"; line: 1, column: 1])
 at [Source: (String)"{"; line: 1, column: 3]
    at org.wololo.geojson.GeoJSONFactory.create(GeoJSONFactory.java:31)
    at org.wololo.jts2geojson.GeoJSONReader.read(GeoJSONReader.java:20)
    at org.wololo.jts2geojson.GeoJSONReader.read(GeoJSONReader.java:16)
    at org.apache.sedona.core.formatMapper.FormatMapper.readGeoJSON(FormatMapper.java:206)
    at org.apache.sedona.core.formatMapper.FormatMapper.readGeometry(FormatMapper.java:304)
    at org.apache.sedona.core.formatMapper.FormatMapper.call(FormatMapper.java:377)
    at org.apache.sedona.core.formatMapper.FormatMapper.call(FormatMapper.java:52)
    at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitions(JavaRDDLike.scala:153)
    at org.apache.spark.rdd.RDD.$anonfun$mapPartitions(RDD.scala:837)
    at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$adapted(RDD.scala:837)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:127)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run(Executor.scala:462)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:465)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: com.fasterxml.jackson.core.io.JsonEOFException: Unexpected end-of-input: expected close marker for Object (start marker at [Source: (String)"{"; line: 1, column: 1])
 at [Source: (String)"{"; line: 1, column: 3]
    at com.fasterxml.jackson.core.base.ParserMinimalBase._reportInvalidEOF(ParserMinimalBase.java:664)
    at com.fasterxml.jackson.core.base.ParserBase._handleEOF(ParserBase.java:486)
    at com.fasterxml.jackson.core.base.ParserBase._eofAsNextChar(ParserBase.java:498)
    at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._skipWSOrEnd(ReaderBasedJsonParser.java:2354)
    at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextFieldName(ReaderBasedJsonParser.java:905)
    at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:249)
    at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:68)
    at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:15)
    at com.fasterxml.jackson.databind.ObjectMapper._readTreeAndClose(ObjectMapper.java:4254)
    at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2711)
    at org.wololo.geojson.GeoJSONFactory.create(GeoJSONFactory.java:21)
    ... 32 more

22/03/25 16:52:26 ERROR TaskSetManager: Task 0 in stage 8.0 failed 1 times; aborting job

编辑:该文件包含几何图形...但它位于一个数组中,然后是结构。这是它的样子:

{
    "type": "FeatureCollection",
    "name": "amenity",
    "features": [
      {
        "type": "Feature",
        "feature_type": "amenity",
        "id": "1231312323f",
        "properties": {
          "accessibility": null,
          "address_id": "1231232312",
          "alt_name": null,
          "category": "elevator",
          "correlation_id": null,
          "hours": null,
          "name": null,
          "phone": null,
          "unit_ids": [
            "1232312",
            "123212"
          ],
          "website": null
        },
        "geometry": {
          "type": "Point",
          "coordinates": [
            -121.8888997,
            37.3285715
          ]
        }
      }]}

更新:

我也尝试编写一个非常简单的 GeoJson,但仍然抛出相同的错误...我相信这与 GeoJsonReader 无关:

{
  "type": "Point",
  "coordinates": [
      -105.01621,
      39.57422
  ]
}

正如@Paul H指出的那样,问题与格式有关。这是令人惊讶的,因为该文件是由 Apple 验证的 IMDF 文件......但是,GeoJsonReader 将其呈现为损坏。要解决此问题,请从 'Features' 键过滤 geojson。