Tuesday, October 27, 2009

PDF File Search in Moss

Hidden truth is, Sharepoint doesn't have any of its own search mechanisms. Its just a postman between the client and the database server, where all the sharepoint content is stored. Sharepoint server takes the request string and builds a query to be passed to the SQL server. Upon the query, SQl server passes the string to its own engine!SharePoint only searches the .txt, .htm, .doc, .xls, and .ppt. Coz, SQL Server can crawl through the base file extensions. PDF is a Binary file type can't really be searched with the SQL full text search engine, as it cant understand the format of PDF. So, Adobe came out with its ownfree filters.

Adobe 5.0 iFilter Download

Adobe 6.0 iFilter DownloadYou need to follow the installation procedure in the documentation.

Additional procedure for Adobe v8 iFilter:Adobe v8 comes along with the iFilter. So, no need to install a iFilter for Adobe V8 on a 32- bit windows.
1.

Add the filter-extension to the File types crawled:Start -> Program -> Microsoft Office Server -> SharePoint 3.0 Central Administration -> Search Settings -> File Types -> New File Type (Add extension pdf here)

2.

Modify the following Registry keys by changing their "Default" value to the new CLSID of the Adobe IFilter: {E8978DA6-047F-4E3D-9C78-CDBE46041603}HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdfDefault -> {E8978DA6-047F-4E3D-9C78-CDBE46041603}HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdfDefault -> {E8978DA6-047F-4E3D-9C78-CDBE46041603}

3.

Add the Installation directory of the Adobe Reader v.8 to the System Path.For example, if the Reader is installed on "C:\Program Files\Adobe", then add"C:\Program Files\Adobe\Reader 8.0\Reader"or"C:\Program Files\Adobe\Reader 9.0\Reader" to the system path by:Right Click on My Computer -> Properties -> Advanced -> Environment Variables -> Path (Under System Variables) -> Edit -> (Add "C:\Program Files\Adobe\Reader 8.0\Reader").This effectively tells the adobe IFilter where to pick up the dependent DLLs.

4.

Copy the .gif file that you want to use for the icon to the following folder on the server:SharePoint Server 2007-Drive:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\Template\Images

5.

Edit the DOCICON.xml file to include the .pdf extension.Navigate to SharePoint Server 2007 -Drive:\Program Files\Common Files\Microsoft Shared\Web server extensions\12\Template\XmlOpen the Docicon.xml file.Add an entry for the .pdf extensionSave the DOCICON.xml

6.

Recycle the search service:Run->cmd

iisreset & Stop and Start osearch

7. Now we can crawl and search PDF documents with v.8 Reader.For 64-bit windows,

you need to download the iFilter seperately. iFilter that comes along with the Adobe V8 installation supports only the 32-bit OS.SQL Server's search engine deals with the base types and with the addition of new iFilters, it can go right with the associated formats. Microsoft comes out with a free iFilter for RTF. Other available IFilters for PDF, RTF, MSG, ZIP are fount at IFilterShop.

No comments: