To-do from CCSC-NW 08

Here is a list of the potential changes to SVMTrainer that were suggested to me during this weekend’s conference.

Searcher

  • Implement conditions on acceptable web document sizes to optimize document retrieval time
  • Try using a small initial search as a seed to get other search terms and expand the diversity of my training set – Yahoo! Term Extraction might be good for this, too.

WordFilter

  • Try implementing WordNet in the WordFilter class
  • Find a use for Yahoo! Term Extraction

WebDocument

  • Implement parallelism in the retrieval of search results and the retrieval of web documents
  • Implement a document retrieval timeout and a URL blacklist to prevent hanging on bad downloads

Other

  • Investigate the use of SVMstruct for categorization/ranking problem in multiple dimensions
  • Start doing an independent check on the accuracy of trained sets by keeping 10% of results for categorization rather than training
  • Learn about Xi Alpha estimates and what exactly they mean

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.