Thursday, January 12, 2012

FAST Search for SharePoint Relevancy - Part 1

Topic: FAST Search for SharePoint 2010 Relevancy.

Subject: Understanding Relevancy and FS4SP.
Problem: When I perform a search the items are not returned in the order I expect?

Response: Relevancy is fairly complex but FS4SP has some very powerful capabilities once a user understands how relevancy can be tailored to meet a specific organizations requirements.  The rank of any item under index is made up of 2 distinct parts: 

1.      Static Rank

2.      Dynamic Rank

Total Rank = Static Rank + Dynamic Rank

Relevancy is the order which items will be displayed based on their individual rank when queried from the index. 

Static Rank:

The static rank is determined at CRAWL time and will not change unless an item is re-crawled and environmental factors have changed since the last crawl.  The static rank is calculated from 4 components.

1.     Urldepthrank: Rank points given to boost shorter URLs.

2.      Docrank: Rank points given based on the number of and relative importance of links pointing to an item.

3.     Siterank: Rank points given based on the number of and relative importance of links pointing to the items on a site.

4.     Hwboost: FAST Search Server 2010 for SharePoint placeholder for generic usage of quality rank points.


Dynamic Rank:

The dynamic rank is determined at QUERY time and can change for any item retrieved from the index depending on how the item is retrieved. The majority of dynamic rank is calculated from several components.

1.      Managed Property Context: Associated with Full Text index map.

2.      Freshness: Age of a document

3.      Managed Property Field Boosts: Additional rank points give to specific managed property values.

4.      Authority Weight (Anchor Text): Weight associated with anchor text associated with hyperlinks

5.      Query Authority Weight: Click through relevancy weight

6.      Stop Word Threshold: Associated with how “Managed Property Context” is calculated.

*** Side Notes:

1.      Rank is not Unique
Many items within an index will have the same Rank.  When rank is calculated many items may have the same criteria which when calculated will have the same rank number.

2.      Rank is Dynamic
A single item will not have the same rank for every search performed.  Rank is dynamic meaning an item’s rank will change depending on how the item is retrieved from the index.  Example: With OBB settings, if an item is retrieved from the index based on a hit on the title it will have a higher rank than if it is retrieved by a hit on the body or other managed properties.

 The good news is these are all adjustable to meet the needs of different organizations. 

In the solution\example section I will focus on #1 Managed Property Context for a couple of reasons.  1) Brevity. Trying to write a blog showing a hands-on example of how relevancy works completely would be extremely long (I will try to follow up on other relevancy topics),   2) there happens to be an OOB issue with one setting Managed Property Context settings.

 Solution\Example:

1.      Let's take a look at the Managed Property Context and the relationship to rank and relevancy.

2.      Open SharePoint Central Administration

a.      General Application Settings

b.      Search -> Farm Search Administration

c.      Select the FAST Query SSA

d.      Left hand Navigation Select FAST Search Administration

e.      Select Managed properties

f.       Select the first managed property.

                                                    i.     In my case I select “Account”

g.      Scroll to the bottom and select “View Mappings”





3.      This is the OOB Full-text Index Map used by the Managed Property Context. 

a.      As can be seen the map is broken into 7 levels.

b.      Any managed property can be added to this map.

c.      Dynamic rank points are added based on the search performed.

Example:  A search is performed and 2 items are returned from the index.  Item A was returned because the search term was found in the title managed property and Item B was returned because the search term was found in the body managed property.  When calculating the total rank Item A would receive more rank point than Item B based on the Full-text Index Map.  Remember there are several components that comprise the total rank but assuming the two items had identical scores from all other rank components Item A would outrank Item B in terms of relevancy and therefore be presented first in the search center.

4.      Let’s look at how the Managed Property Context calculates Rank and contributes to the total rank.

5.      Setup Content Source

a.      I have setup a content source which contains 3 documents:

RELEVANCE SAMPLE – ONE.pdf

RELEVANCE SAMPLE – TWO.doc

RELEVANCE SAMPLE – THREE.docx

*** Side Note: For the purpose this example you want to make sure the content of each file will not interfere with the search criteria. “LEVELTEST” or “SAMPLE” except in title and metadata.

b.      I have made sure that everything about the 3 Files are the same (Modified Dates, etc) and how I crawl them (by putting them in the same content source) are identical to ensure that all the relevancy will be the same except for the Full-Text Index Mapping

c.      I created 3 crawled properties to associate with the Documents when crawling.

                                                    i.     LEVEL1

                                                   ii.     LEVEL2

                                                  iii.     LEVEL3


***Side Note: I am using a custom crawler to crawl my content source but I could have just as easily created a document library with 3 custom fields. If you use a document library you will end up with more data in the index but you should still be able to achieve the same end results.

d.      I created 3 Managed properties using the same name as my crawled properties. I have mapped them to the crawled properties and added each of them to the Full-Text Index Map.


***Side Note: If you are not created the crawled properties via PowerShell or another tool you may need to populate all the custom fields and crawl first before being able to setup the managed property.

New Properties:



New Full-Text index Map:


e.      For this pass I have only populated 1 crawled property for each item in my content source.

                                                    i.     RELEVANT TEST – ONE.pdf

1.      Crawled Property: LEVEL1 Value: LEVELTEST

                                                   ii.     RELEVANT TEST – TWO.doc

1.      Crawled Property: LEVEL2 Value: LEVELTEST

                                                  iii.     RELEVANT TEST – THREE.docx

1.      Crawled Property: LEVEL3 Value: LEVELTEST

6.      Crawl the Content Source

7.      Open a FAST Search

a.      Perform a Search for the term “SAMPLE”


b.      I have modified my search center to display 4 managed properties: RANK, LEVEL1, LEVEL2, and LEVEL3.

c.      With this search all 3 of my items where returned based on the “Title” property hit.  With the freshness date and static rank properties being equal all 3 items have the same Rank of 1991 and therefore the same relevancy.  Do not be surprised if your results display in a different order.  This is the order in which my items made it into the index. With the same rank the order is the position the item went into the index. FIFO.

8.      Change the Search Term to “LEVELTEST”

a.      From what we have talked about in the Full-Text Index Map the order shouldn’t change.  Each will be retrieved but from their associated Managed Property: LEVEL1, LEVEL2, and LEVEL3.  We expect LEVEL3 to have a higher priority than LEVEL2 and LEVEL2 higher than LEVEL1.

       

b.      The search Results reveal a different story.  We should have seen exactly what we expected.  We can see that our Rank for each individual item has changed based on how they matched to the Full-Text index map.  Seeing the different ranks between the 2 search terms on the same items shows how rank and relevancy is truly dynamic.


9.      Next we will take a look how Rank is calculated and why we didn’t get our expected results.

10.   I am going to use a fantastic tool on CodePlex FAST Search fro Sharepoint 2010 Query Logger developed by MIKAEL SVENSON who has a worthwhile reading blog @ http://techmikael.blogspot.com/.

a.      I have modified the source a little for the purpose of this blog and will point out where. 

b.      There is no need to make the modification if you are following the steps.  I made the change for clarity.

c.      Static Score = urldepthrank + docrank + siterank + hwboost.

d.      The original output has Static Score = urldepthrank + docrank + siterank + hwboost + context score.

***Side Note: We will see later that Managed Property Context isn’t completely dynamic but has a static portion to it.

e.      I also broken out the individual components of the static score


11.   Download and Start the Query Logger tool by double clicking on the FS4SPQueryLogger.exe.

12.   Click Start Logging

13.   Resubmit Search Terms: “LEVELTEST”

14.   We should have captured the query in the Query Logger.

a.      Select the Term and Click on the Rank Log Tab

b.      Note the highlighted notes showing how total Rank is calculated in this example.

Hit: 1

Title: RELEVANCE SAMPLE - ONE

Query term: 'leveltest'

Context score.................: 236

    Number of hits/score................: 1/20

    Importance level/score.................: 1/216

XRANK score...................: 0      Term: fileextension:csv

XRANK score...................: 0      Term: fileextension:zip

XRANK score...................: 0      Term: fileextension:rtf

XRANK score...................: 0      Term: fileextension:vsd

XRANK score...................: 0      Term: fileextension:oft

XRANK score...................: 0      Term: fileextension:msg

XRANK score...................: 0      Term: fileextension:txt

XRANK score...................: 0      Term: isemptylist:true

XRANK score...................: 0      Term: islistitem:true

Static rank score...............: 270

  urldeptrank.......: 270

  docrank...........: 0

  siterank..........: 0

  hwboost...........: 0

Freshness score...............: 1045

Total Rank score............: 1551

**** TOTAL Rank = Context Score + XRANK Score + Static Score + Freshness Score ****

1551 = 236 + 0 + 270 + 1045

############################

Hit: 2

Title: RELEVANCE SAMPLE - THREE

Query term: 'leveltest'

Context score.................: 164

    Number of hits/score................: 1/20

    Importance level/score.................: 3/144

XRANK score...................: 0      Term: fileextension:csv

XRANK score...................: 0      Term: fileextension:zip

XRANK score...................: 0      Term: fileextension:rtf

XRANK score...................: 0      Term: fileextension:vsd

XRANK score...................: 0      Term: fileextension:oft

XRANK score...................: 0      Term: fileextension:msg

XRANK score...................: 0      Term: fileextension:txt

XRANK score...................: 0      Term: isemptylist:true

XRANK score...................: 0      Term: islistitem:true

Static rank score...............: 270

  urldeptrank.......: 270

  docrank...........: 0

  siterank..........: 0

  hwboost...........: 0

Freshness score...............: 1045

Total Rank score............: 1479

**** TOTAL Rank = Context Score + XRANK Score + Static Score + Freshness Score ****

1479 = 164 + 0 + 270 + 1479

############################

Hit: 3

Title: RELEVANCE SAMPLE - TWO

Url:

Query term: 'leveltest'

Context score.................: 92

    Number of hits/score................: 1/20

    Importance level/score.................: 2/72

XRANK score...................: 0      Term: fileextension:csv

XRANK score...................: 0      Term: fileextension:zip

XRANK score...................: 0      Term: fileextension:rtf

XRANK score...................: 0      Term: fileextension:vsd

XRANK score...................: 0      Term: fileextension:oft

XRANK score...................: 0      Term: fileextension:msg

XRANK score...................: 0      Term: fileextension:txt

XRANK score...................: 0      Term: isemptylist:true

XRANK score...................: 0      Term: islistitem:true

Static rank score...............: 270

  urldeptrank.......: 270

  docrank...........: 0

  siterank..........: 0

  hwboost...........: 0

Freshness score...............: 1045

Total Rank score............: 1407

**** TOTAL Rank = Context Score + XRANK Score + Static Score + Freshness Score ****

1407 = 92 + 0 + 270 + 1121



############################

15.   The only difference between the 3 items is the Context Score “Managed Property Context”

16.   Looking closer at the Context Score

Hit: 1

Title: RELEVANT TEST - ONE

Context score.................: 236

    Number of hits/score................: 1/20

    Importance level/score..............: 1/216

############################

Hit: 2

Title: RELEVANT TEST - THREE

Context score.................: 164

    Number of hits/score................: 1/20

    Importance level/score..............: 3/144

############################

Hit: 3

Title: RELEVANT TEST - TWO

Context score.................: 92

    Number of hits/score................: 1/20

    Importance level/score..............: 2/72

############################

a.      Item 1

                                                    i.     Earned 20 points for a single hit in the Full-Text index Map.

Number of hits/score................: 1/20

                                                   ii.     Earned 216 Points for a hit in Level 1

Importance level/score..............: 1/216

                                                  iii.     Total 236 = 20 + 216

b.      Item 2

                                                    i.     Earned 20 points for a single hit in the Full-Text index Map.

Number of hits/score................: 1/20

                                                   ii.     Earned 144 Points for a hit in Level 3

Importance level/score..............: 3/144

                                                  iii.     Total 164 = 20 + 144

c.      Item 3

                                                    i.     Earned 20 points for a single hit in the Full-Text index Map.

Number of hits/score................: 1/20

                                                  ii.     Earned 92 Points for a hit in Level 3

Importance level/score..............: 2/72

                                                  iii.     Total 92 = 20 + 72

17.   Now that we know how it is calculated the “BIG” question becomes “How did a ‘Lowest Priority’ hit jump the ‘Very Low priority’ AND ‘Low priority’ hits in the Context Map?”



18.   Let’s look deeper into how the points are granted and not just how the points are calculated and applied to the total rank (or relevance)

19.   On a FAST Server Open a FAST Command Shell as Administrator

20.   Execute:

$RankProfile = Get-FASTSearchMetadataRankProfile –Name default

$content = $RankProfile.GetFullTextIndexRanks()|where-Object -filterscript {$_.FullTextIndexReference.Name -eq "content"}

$content

FullTextIndexReference : content

ProximityWeight        : 140

ContextWeight          : 50

21.   Execute:

$content.GetImportanceLevelWeight(1)

$content.GetImportanceLevelWeight(2)

$content.GetImportanceLevelWeight(3)

$content.GetImportanceLevelWeight(4)

$content.GetImportanceLevelWeight(5)

$content.GetImportanceLevelWeight(6)

$content.GetImportanceLevelWeight(7)

               OOB you should get the values

               Level1 - 30

               Level2 - 10

               Level3 - 20

               Level4 - 30

               Level5 - 40

               Level6 - 50

               Level7 - 60

22.   It is apparent where the problem is. The Weight on level 1 is set equal to the weight on level 4.

23.   Create a new Rank Profile

a.      It is always recommended you create and use a new RankProfile when making changes.  This gives you the ability to compare differences between Profiles when making changes and before implementing any changes in production.

b.      From the FAST Command Shell Execute:

New-FASTSearchMetadataRankProfile -name default1

c.      Any new Rank Profile will inherit from the default Profile unless specified

                                                    i.     You can execute the commands from #20 and #21 replacing “default” with “default1”.

                                                   ii.     The two rank profiles should be identical

24.   Update the “default1” profile to set Importance Level1 to an appropriate value

a.      I choose to set the Level 1 weight to 5. I could have just as easily changed them to 10 through 70 but for this example I want to change as little as possible.

b.      From the FAST Command Shell Execute:

$RankProfile = Get-FASTSearchMetadataRankProfile -Name default1

$content = $RankProfile.GetFullTextIndexRanks()|where-Object -filterscript {$_.FullTextIndexReference.Name -eq "content"}

$content.SetImportanceLevelWeight(1, 5)

$content.Update()

$content.GetImportanceLevelWeight(1)

25.   Expose new Profile in the Search Center

a.      On the Search Center edit the “Search Action Links” web part and enable the new default rank profile.

26.   Change the Sort by to the new “default1” profile and execute the Search “LEVELTEST”

a.      Notice quickly that the results are now as expected.


27.   Rank Log Results

Hit: 1

Title: RELEVANCE SAMPLE - THREE

Query term: 'leveltest'

Context score.................: 164

    Number of hits/score................: 1/20

    Importance level/score.............: 3/144

Total Rank score............: 1479

############################

Hit: 2

Title: RELEVANCE SAMPLE - TWO

Query term: 'leveltest'

Context score.................: 92

    Number of hits/score................: 1/20

    Importance level/score.............: 2/72

Total Rank score............: 1407

############################

Hit: 3

Title: RELEVANCE SAMPLE - ONE

Query term: 'leveltest'

Context score.................: 56

    Number of hits/score................: 1/20

    Importance level/score.............: 1/36

Total Rank score............: 1327

############################

28.   I used a pretty simple example to show how the Managed Property Context map works so I will use another quick example.

29.   I populate all three Crawled properties with the value of “LEVELTEST” for all three documents and re-crawled.

30.   The follow are the search results and rank.
   

31.   Note the difference in the Rank Calculation.

a.      All Levels where hits matched the Full-Text Index map contribute to the Rank.

b.      This will not always be the case.  There are some advance situations where not all levels will be applied depending on the search term and how many items are under index.  I will leave the advanced settings and how, when, and why for a follow-up post.

Hit: 1

Title: RELEVANCE SAMPLE - THREE

Query term: 'leveltest'

Context score.................: 278

    Number of hits/score................: 3/26

    Importance level/score.................: 3, 2, 1/252

Total Rank score............: 1593

############################

Hit: 2

Title: RELEVANCE SAMPLE - TWO

Query term: 'leveltest'

Context score.................: 278

    Number of hits/score................: 3/26

    Importance level/score.................: 3, 2, 1/252

Total Rank score............: 1593

############################

Hit: 3

Title: RELEVANCE SAMPLE - ONE

Query term: 'leveltest'

Context score.................: 278

    Number of hits/score................: 3/26

    Importance level/score.................: 3, 2, 1/252

Total Rank score............: 1593

############################

32.   Let’s re-visit some the Weight Properties whether from the content level or the individual importance levels.

FullTextIndexReference : content

ProximityWeight        : 140

ContextWeight          : 50

a.      As you probably noticed weights to not directly relate directly to points.  The weights within individual Dynamic Rank calculations are based on how important they are to other rank calculations in the overall Rank calculation.   If we changed the “ContextWeight” from 50 to 100 we would see the Rank Points produced from the Managed Property Context double meaning it would become more important in the overall rank calculation.

33.   Final Important Note:  The Managed Property Context is considered part of the dynamic portion of relevancy but it does have a static portion to it.  The Full-Text Index Mappings are static.  If you want to add or re-arrange the Map you must re crawl the content for it to take effect.

Conclusion: The Managed Property Context is one portion of how Rank is calculated and relevancy is determined.  I never tell people what they should do but it is pretty obvious that the Importance Level 1 is not set correctly OOB.   The Managed Property Context can be tailored to help an organization improve relevancy from the stand point of what managed properties are added to the Full-Text index map and how much the Managed Property Context itself and individual levels should weigh against other relevancy factors.   It is extremely difficult to adjust relevancy with 20 million items in the index but it is possible. If I had tried to look at the examples I provided with 20 million items I probably would not have noticed the erroneous setting.  Fortunately the Managed Property Context is a part of the dynamic ranking calculation and multiple Rank Profiles are available so trying adjustments and comparing results most times does not required re-crawling content.

KORITFW