Thursday, April 21, 2011

Windows 2008 TIFF IFilter and optical character recognition languages

Topic: SharePoint 2010, FAST Search Server (FS4SP), Windows 2008 TIFF IFilter, OCR languages
Subject: Implementing different languages with the Windows 2008 TIFF IFilter
Big Props: Sanjaya Paudel, a sharp colleague of mine, was reading my blog on “Implementing the Windows 2008 TIFF IFilter and FAST Search for SharePoint 2010 (FS4SP) (http://fs4sp.blogspot.com/2011/04/implementing-windows-2008-tiff-ifilter.html ) and noted I never filled in the blanks regarding advance setting especially for different languages.  This is probably because he speaks a few languages himself.  So he dug it up and forwarded me the following from MSDN.
It definitely adds value.

Windows TIFF IFilter Settings (from MSDN)
Setting preferred optical character recognition languages
By default, the Windows TIFF IFilter uses the default system language to determine which language dictionary to use during the optical character recognition (OCR) process. This Group Policy setting allows you to select one or more preferred OCR languages (they must be from the same code page). This can considerably improve OCR accuracy for documents with multiple-language content.
To set preferred OCR languages
1.    Open the Local Group Policy Editor as follows: Click Start, type gpedit.msc in the Start Search text box, and then press ENTER.
2.    Under Computer Configuration, expand Administrative Templates.
3.    Expand Windows Components, expand Search, and then click OCR.
4.    Double-click Select OCR languages from a code page.
5.    Click Enable, and then select one or more languages.
6.    Click OK.
Forcing optical character recognition of every page of a TIFF image document
This setting bypasses Windows TIFF IFilter performance optimization mechanisms that are designed to skip the OCR processing for images that do not contain text.
To force OCR of every page of a TIFF image document
1.    Open the Local Group Policy Editor as follows: Click Start, type gpedit.msc in the Start Search text box, and then press ENTER.
2.    Under Computer Configuration, expand Administrative Templates.
3.    Expand Windows Components, expand Search, and then click OCR.
4.    Double-click Force TIFF IFilter to OCR every page in a TIFF document.
5.    Click Enable, and then select one or more languages.
6.    Click OK.

 KORITFW

1 comment:

  1. You can try this free online ocr to extract text from tiff image.

    ReplyDelete