<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Brandon M. Westnhibernate.search &#187; Brandon M. West</title>
	<atom:link href="http://brandonmwest.com/tag/nhibernate-search/feed" rel="self" type="application/rss+xml" />
	<link>http://brandonmwest.com</link>
	<description></description>
	<lastBuildDate>Sat, 18 Jun 2011 03:48:12 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
		<item>
		<title>Indexing Legacy Data with NHibernate.Search</title>
		<link>http://brandonmwest.com/dev/indexing-legacy-data-with-nhibernate-search</link>
		<comments>http://brandonmwest.com/dev/indexing-legacy-data-with-nhibernate-search#comments</comments>
		<pubDate>Tue, 22 Dec 2009 00:36:52 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[C#]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[NHibernate]]></category>
		<category><![CDATA[c#]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[lucene.net]]></category>
		<category><![CDATA[nhibernate]]></category>
		<category><![CDATA[nhibernate.search]]></category>
		<category><![CDATA[nhsearch]]></category>

		<guid isPermaLink="false">http://brandonmwest.com/?p=8</guid>
		<description><![CDATA[While configuring NHibernate.Search, I ran into an issue while attempting to batch process a million or so legacy records. When I created the index directly from Lucene.Net, things were speedy and working as expected. When I created the index via NHibernate.Search, the indexer was generating way too many index files, numbering into the hundreds of [...]]]></description>
			<content:encoded><![CDATA[<p>While configuring NHibernate.Search, I ran into an issue while attempting to batch process a million or so legacy records. When I created the index directly from <a href="http://incubator.apache.org/lucene.net/">Lucene.Net</a>, things were speedy and working as expected. When I created the index via NHibernate.Search, the indexer was generating way too many index files, numbering into the hundreds of thousands. As a result, the number of file operations was increasing drastically with each iteration of the indexer, such that the FullTextSession.Index call would never finish.</p>
<p>I spent a long time messing about with different merge factors and max file parameters for Lucene.Net, but I was never able to make it work as I expected. The solution ended up being to force an optimize on the index after a certain number of records. Optimizing a Lucene index is analogous to defragging a harddrive; it orders and compacts the thousands of splintered .cfs files into one big file, thereby solving the problem of having to scan a growing number of files before each write.<br />
<span id="more-8"></span><br />
Here is my generic CreateIndex method that includes periodic optimization. This ended up solving the problem and allowed me to index 1.5 million legacy records in about 3 hours. This code depends on a specific finder implementation, as well as a generic method for optimizing an index, but it should be enough to get the idea across.</p>
<p>[cc lang="csharp"]<br />
public static void CreateIndex<T>(int batchSize)<br />
{<br />
    Type type = typeof(T);</p>
<p>    //Get the query object for the type to be indexed<br />
    object finder = Find.Factory.ResolveFinderFor(type);<br />
    var method = finder.GetType().GetMethod(&#8220;get_All&#8221;);<br />
    var objectQuery = method.Invoke(finder, null) as IQueryable<T>;</p>
<p>    IFullTextSession fullTextSession =<br />
        Search.CreateFullTextSession(NH.CurrentSession);</p>
<p>    var total = objectQuery.Count();<br />
    var iterations = total / batchSize;</p>
<p>    const int optimizeThreshold = 10000;<br />
    var optimizeThresholdCounter = 0;</p>
<p>    //Find the generic optimize method<br />
    MethodInfo optimizeMethod =<br />
        typeof(IndexHelper).GetMethod(&#8220;OptimizeIndex&#8221;);</p>
<p>    //Make it generic for the type in question<br />
    MethodInfo genericOptimizeMethod =<br />
        optimizeMethod.MakeGenericMethod(type);</p>
<p>    for (var i = 0; i < iterations; i++)<br />
    {<br />
        var subset = objectQuery.Skip(i * batchSize).Take(batchSize).ToList();</p>
<p>        int startCount = (i*batchSize);<br />
        int endCount = startCount + batchSize;</p>
<p>        optimizeThresholdCounter += batchSize;</p>
<p>        var tx = fullTextSession.BeginTransaction();<br />
        foreach (T instance in subset)<br />
        {<br />
            fullTextSession.Index(instance);<br />
        }<br />
        tx.Commit();</p>
<p>        fullTextSession.Flush();<br />
        fullTextSession.Clear();</p>
<p>        //If we've hit the threshold, optimize<br />
        if(optimizeThreshold != 0 &#038;&#038;<br />
            optimizeThresholdCounter >= optimizeThreshold)<br />
        {<br />
            genericOptimizeMethod.Invoke(null, null);<br />
            optimizeThresholdCounter = 0;<br />
        }<br />
    }</p>
<p>    //optimize the index one final time<br />
    genericOptimizeMethod.Invoke(null, null);<br />
}<br />
[/cc]</p>
<p>I hope this saves someone some headaches &#8211; I know I wasted a lot of time finding this solution.</p>
<p style="text-align: right;">[ratings]</p>
]]></content:encoded>
			<wfw:commentRss>http://brandonmwest.com/dev/indexing-legacy-data-with-nhibernate-search/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/


Served from: brandonmwest.com @ 2012-05-19 21:15:58 -->
