Configuration to exclude common filetypes (pdf, xls, doc, xml, xsd, csv, ics,...)
It should be possible to define a global list with all filetypes to exclude instead of always defining them using the exclude filter.
This is done in 2.0, we are beta on this right now :-)
The following file extensions are already handled if they are regular A HREF links: (gif|jpg|png|mpg|mp3|wmv|ram|pdf|zip|xpi|jar|exe|docx*|xlsx*|pptx*|wml)
1) File extensions to exclude will become a crawler option