Tuesday, April 5, 2011

Spy Stage in the FAST Search (FS4SP) pipeline.

Topic: FAST Search for SharePoint 2010 (FS4SP) pipeline.

Subject: Tracing and debugging crawl properties in the FAST Search (FS4SP) Pipeline using Spy Stage.
Problem: What is happening to my crawl properties when a document is moving through the FAST pipeline?
Solution: We can use the Spy Stage of the FAST pipeline to see the value of crawled properties at any point in time while an item is being processed through the pipeline.  I often find it useful to trace specific stages in the FAST pipeline. In this example we will look at tracing around the file format detection stage of the pipeline.
1.      On each FAST server which has Document Processors enabled.
a.      Edit the File “<Install Drive>:\FASTSearch\etc\pipelineconfig.xml”
b.      Under the “processors” node find the processor stage named “Spy”

   <!-- Debugging Stages : Use carefully, since the cause considerable I/O load! -->
    <processor name="Spy" type="general" hidden="0">
      <load module="processors.Spy" class="Spy"/>
      <config>
        <param name="SpyDumpFile" value="var/log/spy.txt" type="str"/>
        <param name="FileStringCutOffLen" value="32768" type="int"/>
      </config>
      <description><![CDATA[Debugging stage....]]></description>
      <inputs>
      </inputs>
    </processor>

c.      Copy the XML Node to make a new processor stage. 
                                                    i.     Change the  name to “Spy1”
                                                   ii.     Change the value of the SpyDumpFile to “var/log/spy1.txt”

    <processor name="Spy1" type="general" hidden="0">
      <load module="processors.Spy" class="Spy"/>
      <config>
        <param name="SpyDumpFile" value="var/log/spy1.txt" type="str"/>
        <param name="FileStringCutOffLen" value="32768" type="int"/>
      </config>
      <description><![CDATA[Debugging stage ...]]></description>
      <inputs>
      </inputs>
    </processor>

d.      Locate the <pipelines> node.
e.      Locate the <pipeline name="Office14 (webcluster)" default="1"> node.
f.       Locate the portion of the pipeline which performs “Document Conversion”
      <!-- Document Conversion -->
      <processor name="AttachmentsHandler"/>
      <processor name="UTFDetectorConverter"/>
      <processor name="FastFormatDetector"/>
      <processor name="FormatDetector"/>
      <processor name="XMLMapper"/>
      <processor name="SimpleConverter"/>
      <processor name="PDFConverter"/>
      <processor name="IFilterConverter"/>
      <processor name="SearchExportConverter"/>

g.      Add the 2 Spy stages.
      <!-- Document Conversion -->
      <processor name="AttachmentsHandler"/>
      <processor name="UTFDetectorConverter"/>
      <processor name="Spy" />
      <processor name="FastFormatDetector"/>
      <processor name="Spy1" />
      <processor name="FormatDetector"/>
      <processor name="XMLMapper"/>
      <processor name="SimpleConverter"/>
      <processor name="PDFConverter"/>
      <processor name="IFilterConverter"/>
      <processor name="SearchExportConverter"/>

2.      Reset the FAST Processor Server (pipeline)
a.      Open FAST Command Shell as Administrator
b.      Issue the command: “psctrl reset”.

3.      Test the changes
a.      Place a single document in a Folder accessible to the SharePoint crawler.
b.      Create a new FileShare Content Source in the FAST Content SSA “FAST Search Connector” to crawl the folder.
c.      Run a Full Crawl

4.      Check Results
a.      Two files should have been created
                                                    i.     “<Install Drive>:\FASTSearch\var\log\spy.txt”
                                                   ii.     “<Install Drive>:\FASTSearch\var\log\spy1.txt”
b.      Each file will show all the Crawled Properties available when each Spy stage is executed.

Spy.txt sample:
               #### ATTRIBUTE 0B53E343-9CCC911DO-DBDB-
               00805FCCCE04:FileExtension:31 <type ‘str’>: pdf

Spy1.txt sample:
               #### ATTRIBUTE 0B53E343-9CCC911DO-DBDB-
               00805FCCCE04:FileExtension:31 <type ‘str’>: pdf
                           #### ATTRIBUTE format <type ‘str’>: Adobe PDF
                           #### ATTRIBUTE mime <type ‘str’>: application/pdf

Conclusions: I happened to pick the “FastFormatDetector” Stage to trace around for this example because it is a powerful step in the FAST Pipeline.  A single Spy stage is powerful enough to look at crawled properties in the Pipeline but I wanted to give a more advanced example.  If you look closely at Spy1.txt there are new crawl properties that are set by the “FastFormatDetector” stage that you will not find in the file Spy.txt.
1)      Using a single Spy stage at the beginning of the pipeline is great for seeing what crawled properties are being submitted to the FAST Pipeline. 
2)      Using a single Spy Stage at the end of the pipeline is great for seeing what crawled properties will be submitted to the index.
3)      Using multiple Spy Stages gives the advantage of seeing exactly what is happening at any stage of the pipeline.

No comments:

Post a Comment