Indexing Legacy Data with NHibernate.Search
While configuring NHibernate.Search, I ran into an issue while attempting to batch process a million or so legacy records. When I created the index directly from Lucene.Net, things were speedy and working as expected. When I created the index via NHibernate.Search, the indexer was generating way too many index files, numbering into the hundreds of thousands. As a result, the number of file operations was increasing drastically with each iteration of the indexer, such that the FullTextSession.Index call would never finish.
I spent a long time messing about with different merge factors and max file parameters for Lucene.Net, but I was never able to make it work as I expected. The solution ended up being to force an optimize on the index after a certain number of records. Optimizing a Lucene index is analogous to defragging a harddrive; it orders and compacts the thousands of splintered .cfs files into one big file, thereby solving the problem of having to scan a growing number of files before each write.