openhub.net
Black Duck Software, Inc.
Black Duck Open Hub
Follow @
OH
Sign In
Join Now
Projects
People
Organizations
Tools
Blog
BDSA
Projects
People
Projects
Organizations
Forums
W
Web Archive Access Utilities
Settings
|
Report Duplicate
0
I Use This!
×
Login Required
Log in to Open Hub
Remember Me
Inactive
Commits
: Listings
Analyzed
about 8 hours
ago. based on code collected
about 12 hours
ago.
Apr 19, 2023 — Apr 19, 2024
Showing page 1 of 48
Search / Filter on:
Commit Message
Contributor
Files Modified
Lines Added
Lines Removed
Code Location
Date
Use nutch:tstamp as date in wera Re: [Archive-access-discuss] Nutchwax0.10 and WERA0.4.2: Date field missing from documentLocator->Resultset
sverreb
More...
over 16 years ago
*** empty log message ***
stack-sf
More...
about 17 years ago
* src/java/org/archive/access/nutch/ImportArcs.java Refactor reporter. Add new methods that will report if a long time has elapsed since last report, otherwise, will stay silent. Also fix a bug where we didn't always log the ARC name just opened if happened in same millisecond as Reporter construction. Made an ImportArcReporter out of an anonymous Reporter.
stack-sf
More...
about 17 years ago
BUGFIX: 1639135 [wayback] ArcProxy does not close connections If output stream to machine running HTTP11ResourceStore closes, as is the usual case, when reading a range from an ARC file, and the end of record is encountered, the inputstream from the ARC source was not being closed. On installations where the HTTP11ResourceStore and the ArcProxy are running on the same host, the connections time out more quickly, but in distributed installations, this was quickly becoming a problem.
bradtofel
More...
about 17 years ago
* project.xml Move version past release.
stack-sf
More...
about 17 years ago
* src/articles/releasenotes.xml Add in 0.10.0 bugs.
stack-sf
More...
over 17 years ago
* bin/importArcsLogReporter.py * bin/nutchwaxLogReporter.py * bin/util.py These don't currently work. Remove for now.
stack-sf
More...
over 17 years ago
* xdocs/index.xml Add link to release notes.
stack-sf
More...
over 17 years ago
Readying for 0.10.0 release. * conf/hadoop-site.template.xml Edit. * conf/wax-default.xml Add to wax.index.all description. * src/articles/releasenotes.xml Ready release notes for 0.10.0 release. * src/java/overview.html Edit to match 0.10.0. * xdocs/index.xml News of 0.10.0. * project.xml Set version to 0.10.0.
stack-sf
More...
over 17 years ago
Implement '[ 1288990 ] Configurable collection name in search.jsp': * conf/hadoop-site.xml.template Add note on how to override collection from search result. * conf/wax-default.xml (wax.host): Added. * src/plugin/parse-waxext/src/java/org/apache/nutch/parse/ext/WaxExtParser.java Remove debugging statement. * src/web/search.jsp If path on wax.host, don't add archiveCollection to result URL path.
stack-sf
More...
over 17 years ago
Implement '[ 1503045 ] PDFs have URL for title' * src/plugin/parse-waxext/src/java/org/apache/nutch/parse/ext/WaxExtParser.java Added looking for title at head of returned text from xpdf. (main): Added.
stack-sf
More...
over 17 years ago
* src/articles/releasenotes.xml Add TODO.
stack-sf
More...
over 17 years ago
Trying pdfinfo getting title from pdf. * src/plugin/parse-waxext/bin/parse-pdf.sh Get pdf metainfo. * src/plugin/parse-waxext/src/java/org/apache/nutch/parse/ext/WaxExtParser.java Try parsing a title from returned parse stream.
stack-sf
More...
over 17 years ago
removed -- replaced by index-client
bradtofel
More...
over 17 years ago
* src/java/org/archive/access/nutch/Nutchwax.java Fix classcastexception. * src/java/org/archive/access/nutch/NutchwaxCrawlDb.java Output the list of segment directories we're using to update.
stack-sf
More...
over 17 years ago
* src/java/org/archive/access/nutch/ImportArcs.java Add set status every 5 minutes if reading is taking a long time (We seem to be blocking on S3).
stack-sf
More...
over 17 years ago
Part of [ 1632531 ] [nutchwax] Use parse-pdf in place of xpdf * conf/wax-default.xml * conf/wax-parse-plugins.xml Use parse-pdf in place of parse-waxext. It finds the title which is an advantage (rather than use the URL) and it doesn't spawn an external process. Downsides are it seems to take longer to complete parse and it used to hang. Lets try it for a while to see if it works (Max Schoeffman tried it and its working for him).
stack-sf
More...
over 17 years ago
Apply '[ 1636313 ] [nutchwax-wayback] If exact date passed, use it' Contributed by Max Schoeffman. Reviewed by St.Ack * src/java/org/archive/wayback/resourceindex/NutchResourceIndex.java From Max: I looked a bit closer at waybacks ReplayFilter today and noticed that the current NutchResourceIndex (as a result of my last patch) doesn't behave absolutely correct. The ReplayFilter sets the date supplied with the URL as EXACT_DATE and the current timestamp as END_DATE. The NutchResourceIndex now constructs the date range for nutch from START_DATE to END_DATE, which seems wrong as this will always just return a version between 1996 and "today". The attached patch changes this to give precedence to EXACT_DATE over END_DATE if the former is specified.
stack-sf
More...
over 17 years ago
TWEAK: Upped revision to 0.9.0
bradtofel
More...
over 17 years ago
* xdocs/user_manual.xml Point explicitly at the nutchwax-wayback bridge doc.
stack-sf
More...
over 17 years ago
RELEASE: 0.8.0
bradtofel
More...
over 17 years ago
TWEAK: removed link to status line.
bradtofel
More...
over 17 years ago
TWEAK: changes in this derived file reflect changes in src/config/*.xml
bradtofel
More...
over 17 years ago
FEATURE: added two new regex removals for .NET session IDs embedded in URLs.
bradtofel
More...
over 17 years ago
FEATURE: added bdb-client and bin-search to command line tool exports
bradtofel
More...
over 17 years ago
TWEAK: changed PipelineFilter name to RemoteSubmitFilter. changed filter path from /pipeline/ to /index-incoming/ -- this filter no longer provides status on the pipeline and configuration.
bradtofel
More...
over 17 years ago
BUGFIX: proxy.redirect needs to be an absolute URL -- otherwise it ends up being relative to the server being viewed.
bradtofel
More...
over 17 years ago
TWEAK: changed default ArcProxy context from locationdb to arc-proxy, and filter directory to proxied arcs from /arc-proxy/ to /arcs/.
bradtofel
More...
over 17 years ago
TWEAK: changed filter export from /arc-proxy/ to /arcs/
bradtofel
More...
over 17 years ago
TWEAK: whitespace and documentation
bradtofel
More...
over 17 years ago
←
1
2
3
4
5
6
7
8
9
…
47
48
→
This site uses cookies to give you the best possible experience. By using the site, you consent to our use of cookies. For more information, please see our
Privacy Policy
Agree