IntroductionLarge libraries often contain multiple catalogs, digital repositories, and other data sources. Generally, each of these must be searched independently or through federated search systems. The goal of Meercat is to provide a metadata harvesting and management system that can maintain up to date copies of the metadata in one location so that it can either be harvested by another service or used directly for discovery and retrieval of detailed resource metadata. The open source Lucene extension, Solr, is used to facilitate discovery and there is a REST interface to access more detailed information on resources.
TechnologiesLanguagesThe core system is written in Python. Harvesters and storage are independent of any runtime system, but the jobs and scheduling system requires Twisted. XSLT is used to transform chunks of metadata from one format to another.
HarvestersCurrent harvester sources implemented are Voyager ILS catalogs, SFX electronic resources, and metalib databases. We plan on adding a harvester for OAI-PMH servers. All harvesters implement an API and more can be added and integrated easily as additional Python modules.
Queriable HarvestersQueriable harvesters from data sources such as the Voyager ILS allow Meercat to stay current with circulation information about physical resources. Queriable harvesters are an extension of the base harvester API and add the ability to incrementally harvest resources and to harvest only resources that have been modified in a certain time frame.
Solr (Search Indexing)Apache Solr is used to facilitate discovery. Simple metadata such as title, creator and description are indexed directly in Solr while some complex data such as location and status are reduced to simple fields that can be indexed by Solr for faceting and filtering of search results. The metadata is transformed using a MapReduce framework in Twisted, an asynchronous, multi-threaded Python library.
Top Level ComponentsMeercat is comprised of reusable Python packages that can be replaced or upgraded independently of the rest of the system. The core package types are:
meercat meercat.harvester meercat.job meercat.server meercat.solr meercat.storage meercat.ui
Related ProjectsOther projects that we are of aware of that are looking at library resource discovery are: Extensible Catalog Blacklight VuFind