I ran my first couple of training sets today. I must confess, the results are not pretty. Let’s start with the summary: Summary The training set for the text categorization example given by Joachims contains 2000 weighted example vectors. The precision of the resultant model, as estimated by svm_learn, is 93.07%. My first training set…
Read moreSearch Limits
Today I learned about some under-documented limits on Google’s AJAX search API. While working on my Searcher class (that will eventually generate training sets for the SVM) I asked Java to print the first 50 page titles that Google returned. Every time I ran the program I would get a JSONException after 28 results. Upon…
Read moreWhy JSON?
I was wondering as I researched last night why Google AJAX Search API was using JSON. I have never even heard of JSON (JavaScript Object Notation). I fully expected the AJAX API to be using XML… it is Asynchronous JavaScript And XML, after all. But, at least for the RESTfulinterface (another term I’ve never heard)…
Read moreGoogle’s AJAX search from Java
My efforts today revolved around Google’s search API. It took a while to find out how to search Google from Java. The first lead I found was an old page from Pace University’s CS department, which mentioned a “Google API” and the need for a developer key. It didn’t take me long to find out…
Read moreSelf-Training Categorizer
I’m beginning a new project this month, to run through December. I’m going to learn how to train a Support Vector Machine (SVM) to categorize text, and then write a program that will automatically train the SVM using web searches generate training material. Once I’ve got a semblance of a working system, I’ll be building…
Read more