{"id":928,"date":"2008-08-14T00:16:12","date_gmt":"2008-08-14T07:16:12","guid":{"rendered":"http:\/\/islemaster.wordpress.com\/?p=64"},"modified":"2014-03-17T00:40:15","modified_gmt":"2014-03-17T07:40:15","slug":"search-limits","status":"publish","type":"post","link":"https:\/\/www.bradleycbuchanan.com\/b\/search-limits\/","title":{"rendered":"Search Limits"},"content":{"rendered":"<p>Today I learned about some under-documented limits on Google&#8217;s AJAX search API.\u00a0 While working on my Searcher class (that will eventually generate training sets for the SVM) I asked Java to print the first 50 page titles that Google returned.\u00a0 Every time I ran the program I would get a JSONException after 28 results.\u00a0 Upon further examination, I found that Google returned the following <a href=\"http:\/\/en.wikipedia.org\/wiki\/HTTP_400#4xx_Client_Error\">400 Bad Request<\/a> JSON whenever I sent a request with the parameter <code>&amp;start<\/code> greater than 28:<br \/>\n<code><br \/>\n{<br \/>\n  \"responseData\": null,<br \/>\n  \"responseDetails\": \"out of range start\",<br \/>\n  \"responseStatus\": 400<br \/>\n}<br \/>\n<\/code><\/p>\n<p>This seemed a little absurd, considering that in previous queries Google claimed to have found over <a href=\"http:\/\/www.google.com\/search?q=Google%20limited%20to%201000%20results&amp;start=1000\">14 million results<\/a> for the same search terms.  Naturally, I started digging online to see if anyone else had encountered this magic 28 barrier.  I soon learned that the AJAX search API is <a href=\"http:\/\/groups.google.gm\/group\/Google-AJAX-Search-API\/browse_thread\/thread\/c12bd78bafa93150\">limited to 32 results<\/a>, and that in order to get all 32 you must include the <code>&amp;rsz=large<\/code> directive in your request, dictating 8 results per request instead of 4.<\/p>\n<p>This could really hinder the quality of my training sets.  I suppose I can just add results to the 100 most recent for each category (I wrote a nice little class to do just that) but then it could take a while to build a diverse training set, several days even if the results changed every day.  On the other hand, I read that <a href=\"http:\/\/developer.yahoo.com\/search\/\">Yahoo&#8217;s web search API<\/a> offers up to 1000 results with a cap of 5000 queries in 24 hours.  Switching to Yahoo might be a good option, <i>if<\/i> their results are kept as up-to-date as Google&#8217;s.  I&#8217;ll have to do some research, or maybe make the search interface modular so I can try both.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Today I learned about some under-documented limits on Google&#8217;s AJAX search API.\u00a0 While working on my Searcher class (that will eventually generate training sets for the SVM) I asked Java to print the first 50 page titles that Google returned.\u00a0 Every time I ran the program I would get a JSONException after 28 results.\u00a0 Upon&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[8],"tags":[25,36,41,380],"class_list":["post-928","post","type-post","status-publish","format-standard","hentry","category-programmer","tag-google","tag-search-api","tag-svm-trainer","tag-yahoo"],"_links":{"self":[{"href":"https:\/\/www.bradleycbuchanan.com\/b\/wp-json\/wp\/v2\/posts\/928","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.bradleycbuchanan.com\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.bradleycbuchanan.com\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.bradleycbuchanan.com\/b\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.bradleycbuchanan.com\/b\/wp-json\/wp\/v2\/comments?post=928"}],"version-history":[{"count":1,"href":"https:\/\/www.bradleycbuchanan.com\/b\/wp-json\/wp\/v2\/posts\/928\/revisions"}],"predecessor-version":[{"id":1220,"href":"https:\/\/www.bradleycbuchanan.com\/b\/wp-json\/wp\/v2\/posts\/928\/revisions\/1220"}],"wp:attachment":[{"href":"https:\/\/www.bradleycbuchanan.com\/b\/wp-json\/wp\/v2\/media?parent=928"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.bradleycbuchanan.com\/b\/wp-json\/wp\/v2\/categories?post=928"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.bradleycbuchanan.com\/b\/wp-json\/wp\/v2\/tags?post=928"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}