Subject: Understanding Relevancy and FS4SP.
Problem: When I perform a search the items are not returned in the order I expect?Response: Relevancy is fairly complex but FS4SP has some very powerful capabilities once a user understands how relevancy can be tailored to meet a specific organizations requirements. The rank of any item under index is made up of 2 distinct parts:
1. Static Rank
2. Dynamic Rank
Total Rank = Static Rank + Dynamic Rank
Relevancy is the order which items will be displayed based on their individual rank when queried from the index.
Static Rank:
The static rank is determined at CRAWL time and will not change unless an item is re-crawled and environmental factors have changed since the last crawl. The static rank is calculated from 4 components.
1. Urldepthrank: Rank points given to boost shorter URLs.
2. Docrank: Rank points given based on the number of and relative importance of links pointing to an item.
3. Siterank: Rank points given based on the number of and relative importance of links pointing to the items on a site.
4. Hwboost: FAST Search Server 2010 for SharePoint placeholder for generic usage of quality rank points.
Dynamic Rank:
The dynamic rank is determined at QUERY time and can change for any item retrieved from the index depending on how the item is retrieved. The majority of dynamic rank is calculated from several components.
1. Managed Property Context: Associated with Full Text index map.
2. Freshness: Age of a document
3. Managed Property Field Boosts: Additional rank points give to specific managed property values.
4. Authority Weight (Anchor Text): Weight associated with anchor text associated with hyperlinks
5. Query Authority Weight: Click through relevancy weight
6. Stop Word Threshold: Associated with how “Managed Property Context” is calculated.
*** Side Notes:
1. Rank is not Unique
Many items within an index will have the same Rank. When rank is calculated many items may have the same criteria which when calculated will have the same rank number.
2. Rank is Dynamic
A single item will not have the same rank for every search performed. Rank is dynamic meaning an item’s rank will change depending on how the item is retrieved from the index. Example: With OBB settings, if an item is retrieved from the index based on a hit on the title it will have a higher rank than if it is retrieved by a hit on the body or other managed properties.
In the solution\example section I will focus on #1 Managed Property Context for a couple of reasons. 1) Brevity. Trying to write a blog showing a hands-on example of how relevancy works completely would be extremely long (I will try to follow up on other relevancy topics), 2) there happens to be an OOB issue with one setting Managed Property Context settings.
1. Let's take a look at the Managed Property Context and the relationship to rank and relevancy.
2. Open SharePoint Central Administration
a. General Application Settings
b. Search -> Farm Search Administration
c. Select the FAST Query SSA
d. Left hand Navigation Select FAST Search Administration
e. Select Managed properties
f. Select the first managed property.
i. In my case I select “Account”
g. Scroll to the bottom and select “View Mappings”
3. This is the OOB Full-text Index Map used by the Managed Property Context.
a. As can be seen the map is broken into 7 levels.
b. Any managed property can be added to this map.
c. Dynamic rank points are added based on the search performed.
Example: A search is performed and 2 items are returned from the index. Item A was returned because the search term was found in the title managed property and Item B was returned because the search term was found in the body managed property. When calculating the total rank Item A would receive more rank point than Item B based on the Full-text Index Map. Remember there are several components that comprise the total rank but assuming the two items had identical scores from all other rank components Item A would outrank Item B in terms of relevancy and therefore be presented first in the search center.
4. Let’s look at how the Managed Property Context calculates Rank and contributes to the total rank.
5. Setup Content Source
a. I have setup a content source which contains 3 documents:
RELEVANCE SAMPLE – ONE.pdf
RELEVANCE SAMPLE – TWO.doc
RELEVANCE SAMPLE – THREE.docx
*** Side Note: For the purpose this example you want to make sure the content of each file will not interfere with the search criteria. “LEVELTEST” or “SAMPLE” except in title and metadata.
b. I have made sure that everything about the 3 Files are the same (Modified Dates, etc) and how I crawl them (by putting them in the same content source) are identical to ensure that all the relevancy will be the same except for the Full-Text Index Mapping
c. I created 3 crawled properties to associate with the Documents when crawling.
i. LEVEL1
ii. LEVEL2
iii. LEVEL3
***Side Note: I am using a custom crawler to crawl my content source but I could have just as easily created a document library with 3 custom fields. If you use a document library you will end up with more data in the index but you should still be able to achieve the same end results.
d. I created 3 Managed properties using the same name as my crawled properties. I have mapped them to the crawled properties and added each of them to the Full-Text Index Map.
***Side Note: If you are not created the crawled properties via PowerShell or another tool you may need to populate all the custom fields and crawl first before being able to setup the managed property.
New Properties:
New Full-Text index Map:
e. For this pass I have only populated 1 crawled property for each item in my content source.
i. RELEVANT TEST – ONE.pdf
1. Crawled Property: LEVEL1 Value: LEVELTEST
ii. RELEVANT TEST – TWO.doc
1. Crawled Property: LEVEL2 Value: LEVELTEST
iii. RELEVANT TEST – THREE.docx
1. Crawled Property: LEVEL3 Value: LEVELTEST
6. Crawl the Content Source
7. Open a FAST Search
a. Perform a Search for the term “SAMPLE”
b. I have modified my search center to display 4 managed properties: RANK, LEVEL1, LEVEL2, and LEVEL3.
c. With this search all 3 of my items where returned based on the “Title” property hit. With the freshness date and static rank properties being equal all 3 items have the same Rank of 1991 and therefore the same relevancy. Do not be surprised if your results display in a different order. This is the order in which my items made it into the index. With the same rank the order is the position the item went into the index. FIFO.
8. Change the Search Term to “LEVELTEST”
a. From what we have talked about in the Full-Text Index Map the order shouldn’t change. Each will be retrieved but from their associated Managed Property: LEVEL1, LEVEL2, and LEVEL3. We expect LEVEL3 to have a higher priority than LEVEL2 and LEVEL2 higher than LEVEL1.
b. The search Results reveal a different story. We should have seen exactly what we expected. We can see that our Rank for each individual item has changed based on how they matched to the Full-Text index map. Seeing the different ranks between the 2 search terms on the same items shows how rank and relevancy is truly dynamic.
9. Next we will take a look how Rank is calculated and why we didn’t get our expected results.
10. I am going to use a fantastic tool on CodePlex FAST Search fro Sharepoint 2010 Query Logger developed by MIKAEL SVENSON who has a worthwhile reading blog @ http://techmikael.blogspot.com/.
a. I have modified the source a little for the purpose of this blog and will point out where.
b. There is no need to make the modification if you are following the steps. I made the change for clarity.
c. Static Score = urldepthrank + docrank + siterank + hwboost.
d. The original output has Static Score = urldepthrank + docrank + siterank + hwboost + context score.
***Side Note: We will see later that Managed Property Context isn’t completely dynamic but has a static portion to it.
e. I also broken out the individual components of the static score
11. Download and Start the Query Logger tool by double clicking on the FS4SPQueryLogger.exe.
12. Click Start Logging
13. Resubmit Search Terms: “LEVELTEST”
14. We should have captured the query in the Query Logger.
a. Select the Term and Click on the Rank Log Tab
b. Note the highlighted notes showing how total Rank is calculated in this example.
Hit: 1
Title: RELEVANCE SAMPLE - ONE
Query term: 'leveltest'
Context score.................: 236
Number of hits/score................: 1/20
Importance level/score.................: 1/216
XRANK score...................: 0 Term: fileextension:csv
XRANK score...................: 0 Term: fileextension:zip
XRANK score...................: 0 Term: fileextension:rtf
XRANK score...................: 0 Term: fileextension:vsd
XRANK score...................: 0 Term: fileextension:oft
XRANK score...................: 0 Term: fileextension:msg
XRANK score...................: 0 Term: fileextension:txt
XRANK score...................: 0 Term: isemptylist:true
XRANK score...................: 0 Term: islistitem:true
Static rank score...............: 270
urldeptrank.......: 270
docrank...........: 0
siterank..........: 0
hwboost...........: 0
Freshness score...............: 1045
Total Rank score............: 1551
**** TOTAL Rank = Context Score + XRANK Score + Static Score + Freshness Score ****
1551 = 236 + 0 + 270 + 1045
############################
Hit: 2
Title: RELEVANCE SAMPLE - THREE
Query term: 'leveltest'
Context score.................: 164
Number of hits/score................: 1/20
Importance level/score.................: 3/144
XRANK score...................: 0 Term: fileextension:csv
XRANK score...................: 0 Term: fileextension:zip
XRANK score...................: 0 Term: fileextension:rtf
XRANK score...................: 0 Term: fileextension:vsd
XRANK score...................: 0 Term: fileextension:oft
XRANK score...................: 0 Term: fileextension:msg
XRANK score...................: 0 Term: fileextension:txt
XRANK score...................: 0 Term: isemptylist:true
XRANK score...................: 0 Term: islistitem:true
Static rank score...............: 270
urldeptrank.......: 270
docrank...........: 0
siterank..........: 0
hwboost...........: 0
Freshness score...............: 1045
Total Rank score............: 1479
**** TOTAL Rank = Context Score + XRANK Score + Static Score + Freshness Score ****
1479 = 164 + 0 + 270 + 1479
############################
Hit: 3
Title: RELEVANCE SAMPLE - TWO
Url:
Query term: 'leveltest'
Context score.................: 92
Number of hits/score................: 1/20
Importance level/score.................: 2/72
XRANK score...................: 0 Term: fileextension:csv
XRANK score...................: 0 Term: fileextension:zip
XRANK score...................: 0 Term: fileextension:rtf
XRANK score...................: 0 Term: fileextension:vsd
XRANK score...................: 0 Term: fileextension:oft
XRANK score...................: 0 Term: fileextension:msg
XRANK score...................: 0 Term: fileextension:txt
XRANK score...................: 0 Term: isemptylist:true
XRANK score...................: 0 Term: islistitem:true
Static rank score...............: 270
urldeptrank.......: 270
docrank...........: 0
siterank..........: 0
hwboost...........: 0
Freshness score...............: 1045
Total Rank score............: 1407
**** TOTAL Rank = Context Score + XRANK Score + Static Score + Freshness Score ****
1407 = 92 + 0 + 270 + 1121
############################
15. The only difference between the 3 items is the Context Score “Managed Property Context”
16. Looking closer at the Context Score
Hit: 1
Title: RELEVANT TEST - ONE
Context score.................: 236
Number of hits/score................: 1/20
Importance level/score..............: 1/216
############################
Hit: 2
Title: RELEVANT TEST - THREE
Context score.................: 164
Number of hits/score................: 1/20
Importance level/score..............: 3/144
############################
Hit: 3
Title: RELEVANT TEST - TWO
Context score.................: 92
Number of hits/score................: 1/20
Importance level/score..............: 2/72
############################
a. Item 1
i. Earned 20 points for a single hit in the Full-Text index Map.
Number of hits/score................: 1/20
ii. Earned 216 Points for a hit in Level 1
Importance level/score..............: 1/216
iii. Total 236 = 20 + 216
b. Item 2
i. Earned 20 points for a single hit in the Full-Text index Map.
Number of hits/score................: 1/20
ii. Earned 144 Points for a hit in Level 3
Importance level/score..............: 3/144
iii. Total 164 = 20 + 144
c. Item 3
i. Earned 20 points for a single hit in the Full-Text index Map.
Number of hits/score................: 1/20
ii. Earned 92 Points for a hit in Level 3
Importance level/score..............: 2/72
iii. Total 92 = 20 + 72
17. Now that we know how it is calculated the “BIG” question becomes “How did a ‘Lowest Priority’ hit jump the ‘Very Low priority’ AND ‘Low priority’ hits in the Context Map?”
18. Let’s look deeper into how the points are granted and not just how the points are calculated and applied to the total rank (or relevance)
19. On a FAST Server Open a FAST Command Shell as Administrator
20. Execute:
$RankProfile = Get-FASTSearchMetadataRankProfile –Name default
$content = $RankProfile.GetFullTextIndexRanks()|where-Object -filterscript {$_.FullTextIndexReference.Name -eq "content"}
$content
FullTextIndexReference : content
ProximityWeight : 140
ContextWeight : 50
21. Execute:
$content.GetImportanceLevelWeight(1)
$content.GetImportanceLevelWeight(2)
$content.GetImportanceLevelWeight(3)
$content.GetImportanceLevelWeight(4)
$content.GetImportanceLevelWeight(5)
$content.GetImportanceLevelWeight(6)
$content.GetImportanceLevelWeight(7)
OOB you should get the values
Level1 - 30
Level2 - 10
Level3 - 20
Level4 - 30
Level5 - 40
Level6 - 50
Level7 - 60
22. It is apparent where the problem is. The Weight on level 1 is set equal to the weight on level 4.
23. Create a new Rank Profile
a. It is always recommended you create and use a new RankProfile when making changes. This gives you the ability to compare differences between Profiles when making changes and before implementing any changes in production.
b. From the FAST Command Shell Execute:
New-FASTSearchMetadataRankProfile -name default1
c. Any new Rank Profile will inherit from the default Profile unless specified
i. You can execute the commands from #20 and #21 replacing “default” with “default1”.
ii. The two rank profiles should be identical
24. Update the “default1” profile to set Importance Level1 to an appropriate value
a. I choose to set the Level 1 weight to 5. I could have just as easily changed them to 10 through 70 but for this example I want to change as little as possible.
b. From the FAST Command Shell Execute:
$RankProfile = Get-FASTSearchMetadataRankProfile -Name default1
$content = $RankProfile.GetFullTextIndexRanks()|where-Object -filterscript {$_.FullTextIndexReference.Name -eq "content"}
$content.SetImportanceLevelWeight(1, 5)
$content.Update()
$content.GetImportanceLevelWeight(1)
25. Expose new Profile in the Search Center
a. On the Search Center edit the “Search Action Links” web part and enable the new default rank profile.
26. Change the Sort by to the new “default1” profile and execute the Search “LEVELTEST”
a. Notice quickly that the results are now as expected.
27. Rank Log Results
Hit: 1
Title: RELEVANCE SAMPLE - THREE
Query term: 'leveltest'
Context score.................: 164
Number of hits/score................: 1/20
Importance level/score.............: 3/144
Total Rank score............: 1479
############################
Hit: 2
Title: RELEVANCE SAMPLE - TWO
Query term: 'leveltest'
Context score.................: 92
Number of hits/score................: 1/20
Importance level/score.............: 2/72
Total Rank score............: 1407
############################
Hit: 3
Title: RELEVANCE SAMPLE - ONE
Query term: 'leveltest'
Context score.................: 56
Number of hits/score................: 1/20
Importance level/score.............: 1/36
Total Rank score............: 1327
############################
28. I used a pretty simple example to show how the Managed Property Context map works so I will use another quick example.
29. I populate all three Crawled properties with the value of “LEVELTEST” for all three documents and re-crawled.
30. The follow are the search results and rank.
31. Note the difference in the Rank Calculation.
a. All Levels where hits matched the Full-Text Index map contribute to the Rank.
b. This will not always be the case. There are some advance situations where not all levels will be applied depending on the search term and how many items are under index. I will leave the advanced settings and how, when, and why for a follow-up post.
Hit: 1
Title: RELEVANCE SAMPLE - THREE
Query term: 'leveltest'
Context score.................: 278
Number of hits/score................: 3/26
Importance level/score.................: 3, 2, 1/252
Total Rank score............: 1593
############################
Hit: 2
Title: RELEVANCE SAMPLE - TWO
Query term: 'leveltest'
Context score.................: 278
Number of hits/score................: 3/26
Importance level/score.................: 3, 2, 1/252
Total Rank score............: 1593
############################
Hit: 3
Title: RELEVANCE SAMPLE - ONE
Query term: 'leveltest'
Context score.................: 278
Number of hits/score................: 3/26
Importance level/score.................: 3, 2, 1/252
Total Rank score............: 1593
############################
32. Let’s re-visit some the Weight Properties whether from the content level or the individual importance levels.
FullTextIndexReference : content
ProximityWeight : 140
ContextWeight : 50
a. As you probably noticed weights to not directly relate directly to points. The weights within individual Dynamic Rank calculations are based on how important they are to other rank calculations in the overall Rank calculation. If we changed the “ContextWeight” from 50 to 100 we would see the Rank Points produced from the Managed Property Context double meaning it would become more important in the overall rank calculation.
33. Final Important Note: The Managed Property Context is considered part of the dynamic portion of relevancy but it does have a static portion to it. The Full-Text Index Mappings are static. If you want to add or re-arrange the Map you must re crawl the content for it to take effect.
Conclusion: The Managed Property Context is one portion of how Rank is calculated and relevancy is determined. I never tell people what they should do but it is pretty obvious that the Importance Level 1 is not set correctly OOB. The Managed Property Context can be tailored to help an organization improve relevancy from the stand point of what managed properties are added to the Full-Text index map and how much the Managed Property Context itself and individual levels should weigh against other relevancy factors. It is extremely difficult to adjust relevancy with 20 million items in the index but it is possible. If I had tried to look at the examples I provided with 20 million items I probably would not have noticed the erroneous setting. Fortunately the Managed Property Context is a part of the dynamic ranking calculation and multiple Rank Profiles are available so trying adjustments and comparing results most times does not required re-crawling content.
KORITFW