146
I Use This!
High Activity

News

Analyzed about 14 hours ago. based on code collected 1 day ago.
Posted over 14 years ago by Shalin Shekhar Mangar
Posted over 14 years ago by [email protected] (Shalin Shekhar Mangar)
From the official announcement:Apache Solr 1.4 has been released and is now available for public download!http://www.apache.org/dyn/closer.cgi/lucene/solr/Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene ... [More] project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features ofmany of the world's largest internet sites.Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Tomcat. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. Solr's powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required.New Solr 1.4 features includeMajor performance enhancements in indexing, searching, and facetingRevamped all-Java index replication that's simple to configure and can replicate configuration filesGreatly improved database integration via the DataImportHandlerRich document processing (Word, PDF, HTML) via Apache TikaDynamic search results clustering via Carrot2Multi-select faceting (support for multiple items in a single category to be selected)Many powerful query enhancements, including ranges over arbitrary functions, and nested queries of different syntaxesMany other plugins including Terms for auto-suggest, Statistics, TermVectors, DeduplicationPerformance EnhancementsA simple FieldCache load testFiltered query performance increasesSolr scalability improvementsSolr faceted search performance improvementsImprovements in Solr Faceting SearchRevamped All-Java ReplicationSolrReplication wiki pageWorks on Microsoft Windows Platforms too!DataImportHandler improvementsWhat's new in DataImportHandler in SolrDataImportHandler wiki pageRich document processingExtractingRequestHandler Wiki pagePosting Rich Documents to Apache Solr using SolrJ and Solr CellDynamic Search Results ClusteringClusteringComponent Wiki pageSolr's new Clustering CapabilitiesMulti-select FacetingLocal params for facetingTagging and excluding filtersQuery EnhancementsRanges over functionsNested query support for any type of query parser (via QParserPlugin). Quotes will often be necessary to encapsulate the nested query if it contains reserved characters. Example: _query_:"{!dismax qf=myfield}how now brown cow"New PluginsTermsComponent (can be used for auto-suggest)TermVectorComponentStatisticsDeduplicationSolrJ - Java clientFaster, more efficient Binary Update formatJavabean (POJO) binding supportFast multi-threaded updates through StreamingUpdateSolrServerSimple round-robin load balancing client - LBHttpSolrServerStream documents through an Iterator APIMany performance optimizationsMiscellaneousRollback command in UpdateHandlerMore configurable logging through the use of SLF4J library'commitWithin' parameter on add document command allows setting a per-request auto-commit time limit.TokenFilter factories for Arabic languageImproved Thai language tokenization (SOLR-1078)Merge multiple indexesExpunge Deletes commandUpgrade instructionsAlthough Solr 1.4 is backwards-compatible with previous releases, users are encouraged to read the upgrading notes in the Solr Change Log.There are so many more new features, optimizations, bug fixes and refactorings that it is not possible to cover them all in a single blog post.A large amount of effort has gone into this release. Many congratulations to the entire Solr community for making this happen!Great things are planned for the next release and it is a great time to get involved. See http://wiki.apache.org/solr/HowToContribute for how to get started.Enjoy Solr 1.4 and let us know on the mailing lists if you have any questions! [Less]
Posted over 14 years ago
From the official announcement:Apache Solr 1.4 has been released and is now available for public download!http://www.apache.org/dyn/closer.cgi/lucene/solr/Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene ... [More] project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features ofmany of the world’s largest internet sites.Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Tomcat. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. Solr’s powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required.New Solr 1.4 features include Major performance enhancements in indexing, searching, and faceting Revamped all-Java index replication that’s simple to configure and can replicate configuration files Greatly improved database integration via the DataImportHandler Rich document processing (Word, PDF, HTML) via Apache Tika Dynamic search results clustering via Carrot2 Multi-select faceting (support for multiple items in a single category to be selected) Many powerful query enhancements, including ranges over arbitrary functions, and nested queries of different syntaxes Many other plugins including Terms for auto-suggest, Statistics, TermVectors, Deduplication Performance Enhancements A simple FieldCache load test Filtered query performance increases Solr scalability improvements Solr faceted search performance improvements Improvements in Solr Faceting Search Revamped All-Java Replication SolrReplication wiki page Works on Microsoft Windows Platforms too! DataImportHandler improvements What’s new in DataImportHandler in Solr DataImportHandler wiki page Rich document processing ExtractingRequestHandler Wiki page Posting Rich Documents to Apache Solr using SolrJ and Solr Cell Dynamic Search Results Clustering ClusteringComponent Wiki page Solr’s new Clustering Capabilities Multi-select Faceting Local params for faceting Tagging and excluding filters Query Enhancements Ranges over functions Nested query support for any type of query parser (via QParserPlugin). Quotes will often be necessary to encapsulate the nested query if it contains reserved characters. Example: _query_:”{!dismax qf=myfield}how now brown cow” New Plugins TermsComponent (can be used for auto-suggest) TermVectorComponent Statistics Deduplication SolrJ - Java client Faster, more efficient Binary Update format Javabean (POJO) binding support Fast multi-threaded updates through StreamingUpdateSolrServer Simple round-robin load balancing client - LBHttpSolrServer Stream documents through an Iterator API Many performance optimizations Miscellaneous Rollback command in UpdateHandler More configurable logging through the use of SLF4J library ‘commitWithin’ parameter on add document command allows setting a per-request auto-commit time limit. TokenFilter factories for Arabic language Improved Thai language tokenization (SOLR-1078) Merge multiple indexes Expunge Deletes command Upgrade instructionsAlthough Solr 1.4 is backwards-compatible with previous releases, users are encouraged to read the upgrading notes in the Solr Change Log.There are so many more new features, optimizations, bug fixes and refactorings that it is not possible to cover them all in a single blog post.A large amount of effort has gone into this release. Many congratulations to the entire Solr community for making this happen!Great things are planned for the next release and it is a great time to get involved. See http://wiki.apache.org/solr/HowToContribute for how to get started.Enjoy Solr 1.4 and let us know on the mailing lists if you have any questions! [Less]
Posted over 14 years ago by Shalin Shekhar Mangar
Posted over 14 years ago by [email protected] (Shalin Shekhar Mangar)
Note: The following material and presentation was prepared for students of the Indian Institute of Information Technology (IIIT), Allahabad. The aim was to get them excited about contributing to open source projects and in particular about Apache ... [More] Lucene, Solr and Hadoop. The first talk was titled "Why you should contribute to Open Source" and was aimed at freshmen and has no technical content. The second was titled "Get involved with the Apache Software Foundation" and was given to sophomore, junior and senior students and it goes into some basic technical information on Apache Lucene, Solr and Hadoop projects. The following post comprises of some notes that I put together for the talks.Work on what you like, when you likeEverybody wants to work on "cool" products. However, the reality is that most of you will get stuck in a job which although may pay well, it will hardly be about the things you wanted to work on. In your course, you will learn about algorithms, distributed systems, natural language processing, information retrieval, bio-informatics and other areas of computer science and its applications but in real life, the majority of the work done in software companies will have little direct application of things you will learn in your course.Most of the times you will be using things built by others and writing glue code to build things needed by your company's business. This is not to say that all that knowledge will go waste; it will definitely help you become a better programmer and you should learn it but there's a fair chance that it may not be used directly in your job.Open Source projects offer you a chance to work on something that you want rather than something that others want you to work on. It is a great opportunity to work on something that is both cool and useful as well as to associate with a well known brand and all the publicity and goodwill it brings. You are free to pick and choose between the thousands of open source projects out there. Moreover, you are free to decide on how much you want to contribute. You won't have a boss and you won't have the pressure of deadlines and schedules.Development in the "real" worldAcademic projects are insufficient to impart many of the skills that you'd need once you start developing software full-time. Many of these skills are "social" rather than technical in nature but are at least as important.Most academic projects are "toy" projects. By that, I mean that their whole life cycle revolves around you. You are the designer, developer, tester and also the user. As a result, there are few key things missing in those projects.No build system - Makefiles? Ant? Maven? Very few students are familiar with using them. Don't even ask about creating a build from scratch. "Hey! Just open those files in a text editor or an IDE and hack away" is not an unusual thing to be heardNo source control - CVS? SVN? Git? A single person writing all the code or >80% of the code is very commonNo bug tracker - "It is never going to be used after we demo it to the professors"No user documentation - maybe you will write a research paper detailing your findings but there is little or no documentation written for "other" peopleNo mailing lists or forums for support - Nobody but you is going to use itMoreover, under these circumstances, you never learn how to:Discuss technical design or issues in writingResolve conflicts in matters of design, architecture and a project's road map.Build usable interfaces (whether command line options or a GUI or an API)Write proper error handling and logging codeIdentify hooks for monitoring systems in productionThink about backup and recoveryIdentify components which can be extended or replaced to add or modify functionality of the systemOpen source projects are the real deal. If you are involved for long enough, you will either see or be a part of many such discussions and conflicts. All of the above skills are things you will need when you get around to software development in the real world.Learn from the bestHow many great developers do you know about? How many of them work or have worked on an open source project? I bet there are many names common to both the lists.Open Source development will help you observe how experienced developers work and their various ways of designing, coding and discussing solutions. You will learn new ideas and new ways of solving problems. The second and probably more important part is that many smart programmers will be looking over your code and will provide review comments which will help you improve yourself. You will learn more efficient or shorter (or both) ways to solve the same problem. That kind of feedback is invaluable to a budding programmer.I know that I've learned a great deal since I got involved in Apache Solr.Build a publicly verifiable resumeWhat you tell in your resume are things like contact information, performance in academia, programming languages you know, projects you've worked on and other such stuff. There is very little in this document which can be verified easily. This is a problem for you as well as for the prospective employer because:It may not represent you, your skills and your hard work sufficiently enoughIt makes hiring a game of chance for the prospective employer and prevents them from making more informed decisionsThe best thing about contributing to an open source project is that everything you do is public. So you can say things like the following:I have worked on this project for the last two yearsI wrote features X, Y and Z on Project PI have over two hundred posts on the user forum or mailing listI have commit access to the projectI am the expert because "I wrote it"And your prospective employer can search and verify such things easily. Congratulations, you have just landed on top of the stack of resumes!Companies will find youWhen a company evaluates that an open source Project X can save them a lot of money, it is likely that they will hire a few people who have experience on project X and can support its use internally. Many such companies also allow their developers to work on the project either part-time or full-time. And who else is more qualified to work on the project but you - an existing contributor!More and more companies are starting up around providing training, consulting and support for open source projects. Many such companies exclusively hire existing contributors.Even if an open source project is not used directly inside the company, many tech companies hire open source contributors because:Hiring popular open source developers makes them more cool in the eyes of other developersDevelopers who contribute to open source projects are good programmersI'm sure there are many more reasons other than the ones I've given here. In the end, contributing to an open source project is a good investment of your time and it may well be your big ticket to finding that great job. Good Luck! [Less]
Posted over 14 years ago
Note: The following material and presentation was prepared for students of the Indian Institute of Information Technology (IIIT), Allahabad. The aim was to get them excited about contributing to open source projects and in particular about Apache ... [More] Lucene, Solr and Hadoop. The first talk was titled “Why you should contribute to Open Source” and was aimed at freshmen and has no technical content. The second was titled “Get involved with the Apache Software Foundation” and was given to sophomore, junior and senior students and it goes into some basic technical information on Apache Lucene, Solr and Hadoop projects. The following post comprises of some notes that I put together for the talks.Work on what you like, when you likeEverybody wants to work on “cool” products. However, the reality is that most of you will get stuck in a job which although may pay well, it will hardly be about the things you wanted to work on. In your course, you will learn about algorithms, distributed systems, natural language processing, information retrieval, bio-informatics and other areas of computer science and its applications but in real life, the majority of the work done in software companies will have little direct application of things you will learn in your course.Most of the times you will be using things built by others and writing glue code to build things needed by your company’s business. This is not to say that all that knowledge will go waste; it will definitely help you become a better programmer and you should learn it but there’s a fair chance that it may not be used directly in your job.Open Source projects offer you a chance to work on something that you want rather than something that others want you to work on. It is a great opportunity to work on something that is both cool and useful as well as to associate with a well known brand and all the publicity and goodwill it brings. You are free to pick and choose between the thousands of open source projects out there. Moreover, you are free to decide on how much you want to contribute. You won’t have a boss and you won’t have the pressure of deadlines and schedules.Development in the “real” worldAcademic projects are insufficient to impart many of the skills that you’d need once you start developing software full-time. Many of these skills are “social” rather than technical in nature but are at least as important.Most academic projects are “toy” projects. By that, I mean that their whole life cycle revolves around you. You are the designer, developer, tester and also the user. As a result, there are few key things missing in those projects. No build system - Makefiles? Ant? Maven? Very few students are familiar with using them. Don’t even ask about creating a build from scratch. “Hey! Just open those files in a text editor or an IDE and hack away” is not an unusual thing to be heard No source control - CVS? SVN? Git? A single person writing all the code or >80% of the code is very common No bug tracker - “It is never going to be used after we demo it to the professors” No user documentation - maybe you will write a research paper detailing your findings but there is little or no documentation written for “other” people No mailing lists or forums for support - Nobody but you is going to use it Moreover, under these circumstances, you never learn how to: Discuss technical design or issues in writing Resolve conflicts in matters of design, architecture and a project’s road map. Build usable interfaces (whether command line options or a GUI or an API) Write proper error handling and logging code Identify hooks for monitoring systems in production Think about backup and recovery Identify components which can be extended or replaced to add or modify functionality of the system Open source projects are the real deal. If you are involved for long enough, you will either see or be a part of many such discussions and conflicts. All of the above skills are things you will need when you get around to software development in the real world.Learn from the bestHow many great developers do you know about? How many of them work or have worked on an open source project? I bet there are many names common to both the lists.Open Source development will help you observe how experienced developers work and their various ways of designing, coding and discussing solutions. You will learn new ideas and new ways of solving problems. The second and probably more important part is that many smart programmers will be looking over your code and will provide review comments which will help you improve yourself. You will learn more efficient or shorter (or both) ways to solve the same problem. That kind of feedback is invaluable to a budding programmer.I know that I’ve learned a great deal since I got involved in Apache Solr.Build a publicly verifiable resumeWhat you tell in your resume are things like contact information, performance in academia, programming languages you know, projects you’ve worked on and other such stuff. There is very little in this document which can be verified easily. This is a problem for you as well as for the prospective employer because: It may not represent you, your skills and your hard work sufficiently enough It makes hiring a game of chance for the prospective employer and prevents them from making more informed decisions The best thing about contributing to an open source project is that everything you do is public. So you can say things like the following: I have worked on this project for the last two years I wrote features X, Y and Z on Project P I have over two hundred posts on the user forum or mailing list I have commit access to the project I am the expert because “I wrote it” And your prospective employer can search and verify such things easily. Congratulations, you have just landed on top of the stack of resumes!Companies will find youWhen a company evaluates that an open source Project X can save them a lot of money, it is likely that they will hire a few people who have experience on project X and can support its use internally. Many such companies also allow their developers to work on the project either part-time or full-time. And who else is more qualified to work on the project but you - an existing contributor!More and more companies are starting up around providing training, consulting and support for open source projects. Many such companies exclusively hire existing contributors.Even if an open source project is not used directly inside the company, many tech companies hire open source contributors because: Hiring popular open source developers makes them more cool in the eyes of other developers Developers who contribute to open source projects are good programmers I’m sure there are many more reasons other than the ones I’ve given here. In the end, contributing to an open source project is a good investment of your time and it may well be your big ticket to finding that great job. Good Luck! [Less]
Posted over 14 years ago by [email protected] (Shalin Shekhar Mangar)
DataImportHandler is a Apache Solr module that provides a configuration driven way to import data from databases, XML and other sources into Solr in both "full builds" and incremental delta imports.A large number of new features have been introduced ... [More] since it was introduced in Solr 1.3.0. Here's a quick look at the major new features:Error Handling & RollbackAbility to control behavior on errors was an oft-request feature in DataImportHandler. With Solr 1.4, DataImportHandler provides configurable error handling options for each entity. You can specify the following as an attribute on the tag:onError="abort" - Aborts the import processonError="skip" - Skips the current documentonError="continue" - Continues as if the error never occurredAll errors are still logged regardless of the selected option. When an import aborts, either due to an error or a user command, all changes to the index since the last commit are rolled back.Event ListenersAn API is exposed to write listeners for import start and end. A new interface called EventListener has been introduced which has a single method:public void onEvent(Context ctx);For example, the listener can be specified as:<document onImportStart="com.foo.StartListener" onImportEnd="com.foo.EndListener">Push data to Solr through DataImportHandlerIn Solr 1.3, DataImportHandler was pull based only. If you wanted to push data to Solr e.g. through a HTTP POST request, you had no choice but to convert it to Solr's update XML format or CSV format. That meant that all the DataImportHandler goodness was not available. With Solr 1.4, a new DataSource named ContentStreamDataSource allows one to push data to Solr through a regular POST request.Suppose one wants to push the following XML to Solr and use DataImportHandler to parse and index:<root><b><id>1</id><c>Hello C1</c></b><b><id>2</id><c>Hello C2</c></b></root>We can use ContentStreamDataSource to read the XML pushed to Solr through HTTP POST:<dataConfig><dataSource type="ContentStreamDataSource" name="c"/><document><entity name="b" dataSource="c" processor="XPathEntityProcessor" forEach="/root/b"> <field column="desc" xpath="/root/b/c"/> <field column="id" xpath="/root/b/id"/></entity></document></dataConfig>More Power to TransformersNew flag variables have been added which can be emitted by custom Transformers to skip rows, delete documents or stop further transforms.New DataSourcesFieldReaderDataSource - Reads data from an entity's field. This can be used, for example, to read XMLs stored in databases.ContentStreamDataSource - Accept HTTP POST data in a content stream (described above)New EntityProcessorsPlainTextEntityProcessor - Reads from any DataSource and outputs a StringMailEntityProcessor (experimental) - Indexes mails from POP/IMAP sources into a solr index. Since it required extra dependencies, it is available as a separate package called "solr-dataimporthandler-extras".LineEntityProcessor - Streams lines of text from a given file to be indexed directly or for processing with transformers and child entities.New TransformersHTMLStripTransformer - Strips HTML tags from input text using Solr's HTMLStripCharFilterClobTransformer - Read strings from Clob types in databases.LogTransformer - Log data in a given template format. Very useful for debugging.Apart from the above new features, there have been numerous bug fixes, optimizations and refactorings. In particular:Optimized defaults for database importsDelta imports consume less memoryA 'deltaImportQuery' attribute has been introduced which deprecates the 'deltaQuery' attribute.The 'where' attribute has been deprecated in favor of 'cacheKey' and 'cacheLookup' attributes making CachedSqlEntityProcessor easier to understand and use.Variables placed in DataSources, EntityProcessor and Transformer attributes are now resolved making very dynamic configurations possible.JdbcDataSource can lookup javax.sql.DataSource using JNDIA revamped EntityProcessor APIs for ease in creating custom EntityProcessorsThere are many more changes, see the changelog for the complete list. There's a new DIHQuickStart wiki page which can help you get started faster by providing cheat sheet solutions. Frequently asked questions along with their answers are recorded in the new DataImportHandlerFaq wiki page.A big THANKS to all the contributors and users who have helped us by giving patches, suggestions and bug reports!Future RoadmapOnce Solr 1.4 is released, there are a slew of features targeted for Solr 1.5, including:Multi-threaded indexingIntegration with Solr Cell to import binary and/or structured documents such as Office, Word, PDF and other proprietary formatsDataImportHandler as an API which can be used for creating Lucene indexes (independent of Solr) and as a companion to Solrj (for true push support). It will also be possible to extend it for other document oriented, de-normalized data stores such as CouchDB.Support for reading Gzipped filesSupport for scheduling importsSupport for Callable statements (stored procedures)If you have any feature requests or contributions in mind, do let us know on the solr-user mailing list. [Less]
Posted over 14 years ago by Shalin Shekhar Mangar
Posted over 14 years ago
DataImportHandler is a Apache Solr module that provides a configuration driven way to import data from databases, XML and other sources into Solr in both “full builds” and incremental delta imports.A large number of new features have been introduced ... [More] since it was introduced in Solr 1.3.0. Here’s a quick look at the major new features:Error Handling & RollbackAbility to control behavior on errors was an oft-request feature in DataImportHandler. With Solr 1.4, DataImportHandler provides configurable error handling options for each entity. You can specify the following as an attribute on the “entity” tag: onError=”abort” - Aborts the import process onError=”skip” - Skips the current document onError=”continue” - Continues as if the error never occurred All errors are still logged regardless of the selected option. When an import aborts, either due to an error or a user command, all changes to the index since the last commit are rolled back.Event ListenersAn API is exposed to write listeners for import start and end. A new interface called EventListener has been introduced which has a single method:public void onEvent(Context ctx);For example, the listener can be specified as: Push data to Solr through DataImportHandlerIn Solr 1.3, DataImportHandler was pull based only. If you wanted to push data to Solr e.g. through a HTTP POST request, you had no choice but to convert it to Solr’s update XML format or CSV format. That meant that all the DataImportHandler goodness was not available. With Solr 1.4, a new DataSource named ContentStreamDataSource allows one to push data to Solr through a regular POST request.Suppose one wants to push the following XML to Solr and use DataImportHandler to parse and index:1Hello C12Hello C2 We can use ContentStreamDataSource to read the XML pushed to Solr through HTTP POST: forEach="/root/b"> More Power to TransformersNew flag variables have been added which can be emitted by custom Transformers to skip rows, delete documents or stop further transforms.New DataSources FieldReaderDataSource - Reads data from an entity’s field. This can be used, for example, to read XMLs stored in databases. ContentStreamDataSource - Accept HTTP POST data in a content stream (described above) New EntityProcessors PlainTextEntityProcessor - Reads from any DataSource and outputs a String MailEntityProcessor (experimental) - Indexes mails from POP/IMAP sources into a solr index. Since it required extra dependencies, it is available as a separate package called “solr-dataimporthandler-extras”. LineEntityProcessor - Streams lines of text from a given file to be indexed directly or for processing with transformers and child entities. New Transformers HTMLStripTransformer - Strips HTML tags from input text using Solr’s HTMLStripCharFilter ClobTransformer - Read strings from Clob types in databases. LogTransformer - Log data in a given template format. Very useful for debugging. Apart from the above new features, there have been numerous bug fixes, optimizations and refactorings. In particular: Optimized defaults for database imports Delta imports consume less memory A ‘deltaImportQuery’ attribute has been introduced which is used for delta imports along with ‘deltaQuery’ instead of DataImportHandler manipulating the SQL itself (which was error-prone for complex queries). Using only ‘deltaQuery’ without a ‘deltaImportQuery’ is deprecated and will be removed in future releases. The ‘where’ attribute has been deprecated in favor of ‘cacheKey’ and ‘cacheLookup’ attributes making CachedSqlEntityProcessor easier to understand and use. Variables placed in DataSources, EntityProcessor and Transformer attributes are now resolved making very dynamic configurations possible. JdbcDataSource can lookup javax.sql.DataSource using JNDI A revamped EntityProcessor APIs for ease in creating custom EntityProcessors There are many more changes, see the changelog for the complete list. There’s a new DIHQuickStart wiki page which can help you get started faster by providing cheat sheet solutions. Frequently asked questions along with their answers are recorded in the new DataImportHandlerFaq wiki page.A big THANKS to all the contributors and users who have helped us by giving patches, suggestions and bug reports!Future RoadmapOnce Solr 1.4 is released, there are a slew of features targeted for Solr 1.5, including: Multi-threaded indexing Integration with Solr Cell to import binary and/or structured documents such as Office, Word, PDF and other proprietary formats DataImportHandler as an API which can be used for creating Lucene indexes (independent of Solr) and as a companion to Solrj (for true push support). It will also be possible to extend it for other document oriented, de-normalized data stores such as CouchDB. Support for reading Gzipped files Support for scheduling imports Support for Callable statements (stored procedures) If you have any feature requests or contributions in mind, do let us know on the solr-user mailing list. [Less]
Posted over 14 years ago by [email protected] (Shalin Shekhar Mangar)
Apache Lucene 2.9 has been released. Apache Lucene is a high performance, full-featured text search engine library written entirely in Java.From the official announce email:Lucene 2.9 comes with a bevy of new features, including:Per segment searching ... [More] and caching (can lead to much faster reopen among other things)Near real-time search capabilities added to IndexWriterNew Query typesSmarter, more scalable multi-term queries (wildcard, range, etc)A freshly optimized Collector/Scorer APIImproved Unicode support and the addition of Collation contribA new Attribute based TokenStream APIA new QueryParser framework in contrib with a core QueryParser replacement impl included.Scoring is now optional when sorting by Field, or using a custom Collector, gaining sizable performance when scores are not required.New analyzers (PersianAnalyzer, ArabicAnalyzer, SmartChineseAnalyzer)New fast-vector-highlighter for large documentsLucene now includes high-performance handling of numeric fields. Such fields are indexed with a trie structure, enabling simple to use and much faster numeric range searching without having to externally pre-process numeric values into textual values.And many, many more features, bug fixes, optimizations, and various improvements.Look at the release announcement for more details.Congratulations to the Lucene team! Great work as always.This is also the last minor release which supports Java 1.4 platform. The next release will be 3.0 with which deprecated APIs will be removed and Lucene will officially move to Java 5.0 as the minimum requirement.Solr 1.4 is not far behind and we hope to release it within two weeks. [Less]