为什么这两个 API(twitter geo/search API) return 不同的结果集?
Why these two APIs(twitter geo/search APIs) return different result sets?
我正在从特定区域获取推文,但我得到的结果集非常不同。第一种方法是在给定半径内给出经度和纬度。这些是城市(拉合尔,PK)内的经度和纬度,半径为 5 公里。 5公里是这个城市的一小部分。通过这个,我一天获取了大约 60,000 条推文。
方法一
import tweepy
consumer_key= 'xxxxxxxxxxxxxx'
consumer_secret= 'xxxxxxxxxxxxx'
access_token='xxxxxxxxxxxxxxx'
access_token_secret='xxxxxxxxxxxxxxxxxxxx'
api = tweepy.API(auth,wait_on_rate_limit = Truewait_on_rate_limit_notify= True)
public_tweets = tweepy.Cursor(api.search, count=100, geocode="31.578871,74.305184,5km",since="2018-06-09",show_user = True,tweet_mode="extended").items()
for tweet in public_tweets:
print(tweet.full_text)
第二种方法,我使用推特地理搜索api,通过查询拉合尔,granularity="city"。现在我正在获取整个城市的推文。但现在我一天只收到 1200 条推文。我还提取了过去 7 天的数据,但只获得了 15,000 条推文。这是一个非常大的区别,整个城市只给我 1200 条推文,而同一城市的一小部分给我超过 60,000 条推文。我还打印地点 ID 以验证我获得的多边形是否准确。这些是多边形(
74.4493870, 31.4512220
74.4493870, 31.6124170
74.2675860, 31.6124170
74.2675860、31.4512220),我在 https://www.keene.edu/ 上画这些来验证。是的,这些是拉合尔市的精确多边形。
方法2
import tweepy
consumer_key= 'xxxxxxxxxxxxxx'
consumer_secret= 'xxxxxxxxxxxxx'
access_token='xxxxxxxxxxxxxxx'
access_token_secret='xxxxxxxxxxxxxxxxxxxx'
api = tweepy.API(auth,wait_on_rate_limit = Truewait_on_rate_limit_notify= True)
places = api.geo_search(query="Lahore", granularity="city")
for place in places:
print("placeid:%s" % place)
public_tweets = tweepy.Cursor(api.search, count=100,q="place:%s" % place.id,since="2018-06-09",show_user = True,tweet_mode="extended").items()
for tweet in public_tweets:
print(tweet.full_text)
现在先说说为什么会有这么大的差别。我使用的是标准 Api 版本。
其次,告诉我这些 (api) 如何获取推文。因为只有不到 1% 的推文带有地理标记,而且并非个人资料上的每个用户都提供确切的城市和国家/地区。一些用户提到火星和地球等。那么这些 api 如何工作以获取特定区域的推文。在半径范围内搜索或通过查询 city/country。我研究了 twitter api 文档和 tweepy 文档来研究这些 api 如何在后台工作以收集特定区域的推文,但我没有发现任何有用的 material.
第一种方法有更多结果的原因是如果推文没有任何地理信息,那么使用地理编码搜索将返回到配置文件(正如您已经猜到的那样)并尝试将其解析为 lat/long。
在此处查看文档:
https://developer.twitter.com/en/docs/tweets/search/guides/standard-operators.html
Geolocalization: the search operator “near” isn’t available in the
API, but there is a more precise way to restrict your query by a given
location using the geocode parameter specified with the template
“latitude,longitude,radius”, for example, “37.781157,-122.398720,1mi”.
When conducting geo searches, the search API will first attempt to
find Tweets which have lat/long within the queried geocode, and in
case of not having success, it will attempt to find Tweets created by
users whose profile location can be reverse geocoded into a lat/long
within the queried geocode, meaning that is possible to receive Tweets
which do not include lat/long information.
另一方面,使用 place_id 搜索似乎是在寻找那个确切的地方。这是基本的 api 调用语法:
https://developer.twitter.com/en/docs/tweets/search/guides/tweets-by-place
地方 api 在地理编码中的工作方式与 lat/long 非常不同。以下页面阐明了可以与推文关联的两种类型的位置数据之间的差异:
https://developer.twitter.com/en/docs/tutorials/filtering-tweets-by-location
Tweet-specific location information falls into two general categories:
Tweets with a specific latitude/longitude “Point” coordinate
Tweets with a Twitter “Place” (see our blog post on Twitter Places: More Context For Your Tweets and our documentation on Twitter
geo objects for more information).
...
Tweets with a Twitter “Place” contain a polygon, consisting of 4
lon-lat coordinates that define the general area (the “Place”) from
which the user is posting the Tweet. Additionally, the Place will have
a display name, type (e.g. city, neighborhood), and country code
corresponding to the country where the Place is located, among other
fields.
另外,这一段:注意复数的用法Place IDs
place:
Filter for specific Places by their name or ID. To discover “Places”
associated with a specific area, use Twitter’s reverse_geocode
endpoint in the REST API. Then use the Place IDs you find with the
place: operator to track Tweets that include the specific Place being
referenced. If you use the Place name rather than the numeric ID,
ensure that you quote any names that include spaces or punctuation.
我正在从特定区域获取推文,但我得到的结果集非常不同。第一种方法是在给定半径内给出经度和纬度。这些是城市(拉合尔,PK)内的经度和纬度,半径为 5 公里。 5公里是这个城市的一小部分。通过这个,我一天获取了大约 60,000 条推文。
方法一
import tweepy
consumer_key= 'xxxxxxxxxxxxxx'
consumer_secret= 'xxxxxxxxxxxxx'
access_token='xxxxxxxxxxxxxxx'
access_token_secret='xxxxxxxxxxxxxxxxxxxx'
api = tweepy.API(auth,wait_on_rate_limit = Truewait_on_rate_limit_notify= True)
public_tweets = tweepy.Cursor(api.search, count=100, geocode="31.578871,74.305184,5km",since="2018-06-09",show_user = True,tweet_mode="extended").items()
for tweet in public_tweets:
print(tweet.full_text)
第二种方法,我使用推特地理搜索api,通过查询拉合尔,granularity="city"。现在我正在获取整个城市的推文。但现在我一天只收到 1200 条推文。我还提取了过去 7 天的数据,但只获得了 15,000 条推文。这是一个非常大的区别,整个城市只给我 1200 条推文,而同一城市的一小部分给我超过 60,000 条推文。我还打印地点 ID 以验证我获得的多边形是否准确。这些是多边形( 74.4493870, 31.4512220 74.4493870, 31.6124170 74.2675860, 31.6124170 74.2675860、31.4512220),我在 https://www.keene.edu/ 上画这些来验证。是的,这些是拉合尔市的精确多边形。
方法2
import tweepy
consumer_key= 'xxxxxxxxxxxxxx'
consumer_secret= 'xxxxxxxxxxxxx'
access_token='xxxxxxxxxxxxxxx'
access_token_secret='xxxxxxxxxxxxxxxxxxxx'
api = tweepy.API(auth,wait_on_rate_limit = Truewait_on_rate_limit_notify= True)
places = api.geo_search(query="Lahore", granularity="city")
for place in places:
print("placeid:%s" % place)
public_tweets = tweepy.Cursor(api.search, count=100,q="place:%s" % place.id,since="2018-06-09",show_user = True,tweet_mode="extended").items()
for tweet in public_tweets:
print(tweet.full_text)
现在先说说为什么会有这么大的差别。我使用的是标准 Api 版本。
其次,告诉我这些 (api) 如何获取推文。因为只有不到 1% 的推文带有地理标记,而且并非个人资料上的每个用户都提供确切的城市和国家/地区。一些用户提到火星和地球等。那么这些 api 如何工作以获取特定区域的推文。在半径范围内搜索或通过查询 city/country。我研究了 twitter api 文档和 tweepy 文档来研究这些 api 如何在后台工作以收集特定区域的推文,但我没有发现任何有用的 material.
第一种方法有更多结果的原因是如果推文没有任何地理信息,那么使用地理编码搜索将返回到配置文件(正如您已经猜到的那样)并尝试将其解析为 lat/long。
在此处查看文档:
https://developer.twitter.com/en/docs/tweets/search/guides/standard-operators.html
Geolocalization: the search operator “near” isn’t available in the API, but there is a more precise way to restrict your query by a given location using the geocode parameter specified with the template “latitude,longitude,radius”, for example, “37.781157,-122.398720,1mi”. When conducting geo searches, the search API will first attempt to find Tweets which have lat/long within the queried geocode, and in case of not having success, it will attempt to find Tweets created by users whose profile location can be reverse geocoded into a lat/long within the queried geocode, meaning that is possible to receive Tweets which do not include lat/long information.
另一方面,使用 place_id 搜索似乎是在寻找那个确切的地方。这是基本的 api 调用语法: https://developer.twitter.com/en/docs/tweets/search/guides/tweets-by-place
地方 api 在地理编码中的工作方式与 lat/long 非常不同。以下页面阐明了可以与推文关联的两种类型的位置数据之间的差异:
https://developer.twitter.com/en/docs/tutorials/filtering-tweets-by-location
Tweet-specific location information falls into two general categories:
Tweets with a specific latitude/longitude “Point” coordinate Tweets with a Twitter “Place” (see our blog post on Twitter Places: More Context For Your Tweets and our documentation on Twitter
geo objects for more information).
...
Tweets with a Twitter “Place” contain a polygon, consisting of 4 lon-lat coordinates that define the general area (the “Place”) from which the user is posting the Tweet. Additionally, the Place will have a display name, type (e.g. city, neighborhood), and country code corresponding to the country where the Place is located, among other fields.
另外,这一段:注意复数的用法Place IDs
place:
Filter for specific Places by their name or ID. To discover “Places” associated with a specific area, use Twitter’s reverse_geocode endpoint in the REST API. Then use the Place IDs you find with the place: operator to track Tweets that include the specific Place being referenced. If you use the Place name rather than the numeric ID, ensure that you quote any names that include spaces or punctuation.