Wednesday, April 13, 2011

FS4SP and User_Converter_Rules.xml

Topic: FAST Search for SharePoint 2010 (FS4SP) and user_converter_rules
Subject:  Extending the FAST Search (FS4SP) pipeline, user_converter_rules.xml
If you read my blog on “Implementing Windows 2008 TIFF IFilter and FAST Search for SharePoint (FS4SP)” (http://fs4sp.blogspot.com/2011/04/implementing-windows-2008-tiff-ifilter.html) or if you already customized the pipeline to use a different IFilter than the built-in converters always remember the “Devil is in the details”.  If you are using the SharePoint Crawler you have probably been successful.  If you have written your own protocol handler or crawler you may need to understand how the pipeline interacts with the user_converter_rules.
Problem: I have written my own protocol handler and/or crawler and my user_convert_rules.xml is not working.  Why doesn’t it pick up my files?  How does the user_convert_rules.xml work?
Answer: You might be surprised to find out that the NON-OOB IFilters do not follow the same path through the pipeline as the built-in converters when it comes to format detection.  The built-in FormatFormatDectector sets two crawl properties “format” and “mime”  (See: File Format and FAST Search for SharePoint 2010 http://fs4sp.blogspot.com/2011/04/file-format-and-fast-search-for.html) so one would think one of these two properties would definitely be used to determine which user_converter_rule should be executed but they are not.    The “FileExtension” crawled property is passed to the user_convert_rules.xml. 
If you test the example I blogged about regarding “File Format and FAST Search for SharePoint 2010” you will easily determine that the crawled property “FileExtenstion” is un-reliable.  If the property is not correctly populated the user_convert_rule will not match and execute.

Conclusion: If you write your own protocol handler to interact with the SharePoint crawler or write your own connectors and intend  to use IFilters and user_converter_rules make sure the “FileExtension” property is properly (and correctly) populated before submitting the data to the FAST Content Distributor.
  
KORITFW

No comments:

Post a Comment