Skip to main content

Open Source Search Engines Every Developer Should Know About.


Search is a crucial feature of any website.  Even if your navigation is crystal clear, it still doesn’t cater for those power users and return visitors that remembered a particular piece of content, or want to find collections of information stored on your site in the same topical areas.
Typically search is one of the most poorly implemented pieces of technology on a site, with developers opting for the standard the out of the box solution which comes with most modern content management systems – and in many cases doesn’t do justice to your content. Here we take a look at what other enterprise level and open source search engines out there to find and index the information on your site faster, and provide users with a deeper, more relevant resultset.

Constellio

Constellio is an open source search solution suitable for enterprise level search. It is built on the Apache Solr project, which uses the Lucene project as its main engine and provides  both indexing of webpages and documents via its web based interface. You can select which type of documents to index, including folders and wildcard filenames, and Constellio provides both the search interface and granular control over what gets indexed. It also has indexing support for technologies such as sitemap protocol and RSS.

SearchBlox

Another open source search solution built around Lucene, Searchblox offers a number of advantages over its nearest comparable product Google Mini, and again is based around cross platform technologies (Java). It’s main advantages is that it provides a level of abstraction from Lucene for developers, with a simpler API to interface with, so you can quickly deploy a solution without having to understand all the underlying complexities of Lucene, and offers indexing across third party websites.

Apache Solr

Apache Solr is an open source enterprise level search solution with features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world’s largest internet sites. At its core, Apache Lucene (a well respected Java search library used in many of the afore mentioned products) is used as the underlying engine, with both technologies going hand in hand.
There are also a number of web based implementations of Solr available for those dev’s not wanting the hassle of deployment themselves. Websolr for example provide a platform independant offering of Solr in the cloud, as does PowCloud (still in Beta) which also offers WordPress support integration.

Sphinx

Powering top sites such as Craigslist and Dailymotion, Sphinx is a cross platform open source search server written in C++, which lets you search across various systems, including database servers and NoSQL storage and flat files. A variety of text processing options enable ease indexing of documents, with the ability to fine tune how the relevance algorithm works. Once deployed and setup,  searching via SphinxAPI is as simple as 3 lines of code, and querying via SphinxQL is even simpler, with search queries expressed in good old SQL.

Alternative WordPress Search Plugins

Let’s face it, WordPress search is pretty naff, and for anything or than basic searching is inadequate. So, what’s wrong with the implementation of WordPress search then?
1) It automatically assumes that the freshest content for a particular keyword search is the most relevant.
2) The algorithm doesn’t place any weighting on results based on links
3) The algorithm doesn’t rate the title as more important than the content
4) No highlighting of the terms searched are displayed in the resultset to let users quickly determine relevance.
There’s definitely scope for improvement, and a number of solutions have popped up to fill that particular gap in its implementation and feature set. Here are some of them:
Google Search for WordPress
If you are looking to get up and running quickly with the Google search API, this plugin for WordPress offers an API implementation, with scope for phrase highlighting all provided in an AJAX interface.
More from Google
Augments and extends the existing WordPress search interface with additional posts from others that Google have found on the topic.
Search API
If you are a developer looking to extend upon the existing WordPress functionality, this plugin gives a good framework on which to build your solution with some ready made function calls to enhance things. It supports advanced search capabilities such as boolean search, multiple content searches (posts, tags, pages, authors and any available metadata) and flags (finding posts with A string in category C).
Relevanssi
Relevanssi has both free and premium options available, and solves a number of the problems that highlighted above. There is support for both boolean searches and fuzzy logic out of the box, and the following additional features.
  • Search results sorted in the order of relevance, not by date.
  • Fuzzy matching: match partial words, if complete words don’t match.
  • Find documents matching either just one search term (OR query) or require all words to appear (AND query).
  • Search for phrases with quotes, for example “search phrase”.
  • Create custom excerpts that show where the hit was made, with the search terms highlighted.
  • Highlight search terms in the documents when user clicks through search results.
  • Search comments, tags, categories and custom fields.
Search Everything
Search Everything is perfect for you if you want to search custom data stored within WordPress such as post types, or fields and it also supports searching across attachments.
Search Unleashed
Search Unleashed is an extensible plugin with support for a number of search engines – including Apache Lucene. It comes with the standard implementation, MySQL fulltext and Lucene engines all ready to be deployed, and provides a neat “Priority based” search capability that denotes relevance from where phrases occur inside the WordPress post.  Incoming searches from third party engines can also be given a CSS style to show the searcher how they found the page.

Notable Mentions

http://www.coveo.com/en/products/coveo-expresso – Free for up to 50 users and 100,000 documents. Might be useful for small enterprises
http://sna-projects.com/zoie/ – Real time search indexing built ontop of Lucene.
http://xapian.org/ – search library built on C++
http://www.indextank.com/ – Powers many of the larger social sites (Reddit etc. )
http://www.kneobase.com – An open source solution that indexes zip files, Microsoft Office and more, before turning them into HTML representations and delivering results.

Google SiteSearch

Site search is aimed primarily at websites, and unlike Google Mini, wouldn’t be appropriate for an intranet scenario. It is a fully hosted solution, and offers a number of cool features to site owners looking to enhance the existing search functionality found on their site.  Pricing for site search is on a query basis per year. Starting at $100 for 20,000 queries a year, it’s an inexpensive option for those with lower traffic, but the irony is – you’ll probably not need it until your content gets unmanageable. Obviously, with the more content you have, the more traffic you have, and that’s going to push the cost up.  It is however worth considering, as the technology behind the scenes – as you can imagine, is pretty top notch. For those of you who would rather just tap into the technology, Google Custom search offers an Adsense supported option (which you can receive a revenue share on), that lets you use Google tech for free – customisation from a look and feel perspective is however, limited.

Google Mini

Google Mini is a server based solutions, which offers a way to deploy Google technology inside your website easily. Once deployed, Mini crawls your Web sites and file systems / internal databases, indexing and caching the contents as it goes finally delivering search results through a uniform interface that can be tweaked and designed how you want through their API’s. Costs start at $1,995 (direct) plus a $995 yearly fee after the first year for indexing of 50,000 documents, and scales upwards. For example, a 300,000-document license, the initial cost is $8,995.  Google have also another step up from that again, but it is in many cases outside the scope of budget for many small businesses – with Google search appliance offering all the bells and whistles of Google technology with unlimited indexing for a cool $30,000.
URL: http://www.dataparksearch.org/ - a full-featured open sources web-based search engine released under the GNU General Public License and designed to organize search within a website, group of websites, intranet or local system.
URLhttp://www.open-search-server.com/ - a modern and robust search engine and a suite of high-powered full text search algorithms. Built using the best available open source technologies, OpenSearchServer is an high-performance software and you can embed in all you applications to a better Information Access.
URL: http://openfts.sourceforge.net/ - OpenFTS (Open Source Full Text Search engine) is an advanced PostgreSQL-based search engine that provides online indexing of data and relevance ranking for database searching. Close integration with database allows use of metadata to restrict search results.
URLhttp://www.elasticsearch.org/ – Built with the Cloud in mind, Elastic Search has a very advanced distributed model, speaks natively JSON, and exposes many advance search features, all seamlessly expressed through a JSON layer.

Comments

Popular posts from this blog

9 Free & Open Source Forum Software Solutions

Forums can be a great way to encourage audience participation, and to change a website from a one dimensional entity into a full fledged community. There are a number of well developed open source forums out there on the web to help you create that, each with their own benefits. This review breaks down some of the best software out there in 2011 with a focus on everything from lightweight forum solutions to the all singing, all dancing heavyweights. Vanilla URL: http://vanillaforums.org/ Features URL: http://vanillaforums.org/features/embed-vanilla License: GPL Runs on: PHP / MySQL / Postgres Vanilla has always prided itself on clean underlying code, bringing together web standards, code reuse and css to create a product that kicks lumps out of 1990′s bulletin software that we’ve become accustomed to on the web. With many of the forum software solutions out there on the web carrying a technical burden of tables and bloatware, Vanilla brings with it a fresh approac

5 Free Websites to Create TimeTable Online

Here is a list of  5 free websites to create time table online . These websites let you plan all your classes, subjects; and create a timetable accordingly. Creating a timetable is both important and recommended as it helps you in scheduling things in a better way.  Especially during exam time, you can divide your time between different subjects, using a time table, and study accordingly. On these websites, you can add all your subjects, assignments that you have to do and make yourself a time table that will help you in reminding everything. Some of these websites also let you print your timetable. The 5 free websites to create time table here are  My Study Life, ExamTime, Class Schedule Maker, CollegeRuled, and revisionworld . My Study Life: The first website that allows you to create time table is   My Study Life . It is a very beautiful looking website with some very interesting features. You have to begin by creating a free account. After that, you can add all your subjects

Alternatives to Youtube

The most alternatives to youtube listed all in one place. Here you can find other video sites and also find out how to download and play back the videos. Be sure to visit  Video Search  page, where you can do FOUR SEARCHES AT ONCE - your results open in a four-paned window so that you can search for videos four times faster! http://www.5min.com 5min.com : Videopedia with instructional and DIY videos showing you how to do just about anything. http://www.abcnews.go.com/ ABC News Video : The MSM is finally waking up and putting their content online. See news reports, interviews and old media style content online. http://video.aol.com AOL Video : Millions of videos - as you would expect for a big rich company backed by Time/Warner. http://www.blinkx.com Blinkx  has over 35 million hours of video for you to browse and/or search through. Excellent interface and effects. http://www.blip.tv Blip.TV : Blip claims to level the playing field for independent shows since