1
I Use This!
Activity Not Available

News

Posted over 17 years ago
Here at DL Consulting we’re continuing to make improvements to Greenstone’s support for importing and displaying METS/ALTO data. METS/ALTO is an XML schema published by the Library of Congress, and being used by the US National Digital Newspaper ... [More] Project (NDNP), as well as many other newspaper digitization projects (as well as some collections of books, journals, and other textual resources). In addition to extracting machine-readable text from the page a process resulting in METS/ALTO also records information about individual articles within a page. This allows a user interface to be built where newspaper articles can be displayed on their own, as well as within the pages on which they were printed.The Papers Past site we built last year with the National Library of New Zealand (and which uses METS/ALTO) continues to grow. There are now over 600,000 searchable pages (that’s about 6.5 million newspaper articles!) in the system. We’re happy with how well the system is scaling, but continue to work on further improvements, with the eventual goal being infinite scalability with large collections distributed across multiple computers. We’re making good progress towards that goal thanks to a research grant from the Foundation for Research, Science, and Technology.In addition to the Papers Past collection we’ve built two further METS/ALTO based newspaper collections over recent months. Neither of these sites are accessible to the public yet unfortunately, but we’ll post links on www.dlconsulting.com once they are. Cornell University - The Cornell Daily Sun Digitization project. This project has been using a basic Greenstone system for some time (and which is still online now) but we implemented a major upgrade so the system can now import METS/ALTO data (which Cornell have switched to for the digitization of all remaining newspaper issues) as well as the older (proprietary) data format that was used in the earlier digitization work. METS/ALTO is more flexible than the older format but the system was implemented so that all the data (both old and new formats) are displayed very similarly. The Cornell Daily Sun project also switched to generating web-accessible images on demand with image server software, similar to the way Papers Past does. National Library Board of Singapore. We’ve also been working for many months on a large newspaper collection for the National Library of Singapore, building upon the software written for Papers Past. The Singapore collection will be released later this year, initially with around 600,000 pages of digitized content. That will grow to around 2 million pages over time. The Singapore project has some added complexity, including integration with a digital rights management system (because some of the digitized newspapers are still in copyright) and integration with automated concept (i.e. subject heading) extraction software. In addition, the Singapore project uses large grayscale JPEG2000 source images, as opposed to the black-and-white TIFF images used by Papers Past. We had to redevelop our image server software quite significantly to get good performance when processing these JPEG2000 images. We’ve been asked several times if the code written to import and display METS/ALTO data is open source, and if it has been committed back to Greenstone. The answer is yes, of course it’s open source, but no it hasn’t yet been committed back to the Greenstone code base. The reasons for not committing it back are as follows. It’s a lot of highly specialized code, and is only useful to those with METS/ALTO data. My personal belief is that at times we have too much highly specialized functionality added into Greenstone, and that Greenstone2 isn’t currently modular enough to make it easy to add these sorts of major changes. We’ve worked with a number of METS/ALTO based projects and the data itself is always subtly different. That is, the code always needs to be modified to suit the METS/ALTO schema used, so is only useful as a starting point. Having said all of the above, we are of course happy to make the code available to those who want it. Please contact us at [email protected] if you’re planning on building a METS/ALTO based Greenstone collection. [Less]
Posted over 17 years ago
Starting now, nightly “snapshot” releases of Greenstone3 will be constructed and made available on our snapshots page.
Posted over 17 years ago
Upcoming workshop in Malaysia and photos from a recent workshop in Zimbabwe
Posted over 17 years ago
I have been a volunteer research associate in the Greenstone team for more than two years, and was very pleased to be able to visit the University of Waikato, at the invitation of Prof. Ian Witten, from 5 to 19 March 2008 (this was also my first visit to New Zealand). I live in France and […]
Posted over 17 years ago
We have just started the Greenstone Workshop Map, which shows the locations of all the Greenstone workshops and tutorials which have been conducted around the world.
Posted over 17 years ago
A prototype OAI metadata analysis tool - producing statistics and visualisations of repository metadata - is now online.