Wednesday, May 14, 2008

PDF Icon not showing UP in MOSS

Purpose

Out of the box SharePoint will index many types of content. This includes a lot of popular file formats (.ppt, .docx, .doc, .xlsx, .xls, etc...). A list of these file types can be found here. You'll notice that a pretty popular file type (.pdf) is NOT in this list. For the SharePoint search to be able to index .pdfs you need to install a PDF IFilter (Index Filter) that will help the indexing service process PDF files and when building a search index. There are a couple PDF IFilters available from a series of different vendors, but the most popular is probably the one from Adobe. You can download the Adobe PDF IFilter here.

Walkthrough

Stop the IIS Admin Service: Start->Run->Services.msc->Locate the IIS Admin Service and stop it.
Download the Adobe PDF IFilter and install it on your indexing server.
Install this GIF (
) or any icon of your choosing to "C:\Program Files\Common Files\Microsoft Shared\web server extensions\12\TEMPLATE\IMAGES"
Edit the DocIcon.xml file at "C:\Program Files\Common Files\Microsoft Shared\Web server extensions\12\Template\Xml\DOCICON.XML" and add the following text under the tag.


Recycle the application pool of the Shared Service Provider OR do an IISRESET if you're lazy.
Open the SSP Admin site (Central Administration->SharedServices1). Click on Search Settings->File Types->New File Type) and add a pdf file type.


Perform a full crawl on content sources. Search Settings->Content Sources and Crawl Schedules, click on the Content Source you want to perform a full crawl on. Check the Start Full Crawl check box at the bottom and then click OK. Wait for the crawl to finish.

You're done! PDF content should start showing up in searches now!


Troubleshooting

There's been a couple of machines I've done this one where I've had to manually register the PDF IFilter dll (regsvr32 "C:\Program Files\Adobe\PDF IFilter 6.0\PDFFILT.dll") and then recycle the SSP site before the icon would show up. Usually in these cases I also had to do another full crawl after manually registering the PDFFILT.dll. It's probably a good idea to define a new content source with a small set of content (a couple of small pdfs in a document library).