0
I Use This!
Inactive

Commits : Listings

Analyzed about 19 hours ago. based on code collected 1 day ago.
Apr 18, 2023 — Apr 18, 2024
Commit Message Contributor Files Modified Lines Added Lines Removed Code Location Date
last commit ever to "archive-commons"! More... about 11 years ago
ia-tools: add AccessControlAllowCapture(url, timestamp) udf which can be used to query an access oracle to see if a given url should be included/excluded Adding wayback-access-control/access-control lib as a dependency to ia-tools (seems to be best/simplest way to do this) More... about 11 years ago
archive-commons: extend TimestampDedupIterator with TimestampCustomDedupIterator which also checks an additional field (default status field) when doing dedup (will probably be refactored more) ia-tools: fix bug in HttpZipNumDerefLineRecordReader, init cluster before use More... about 11 years ago
getURLString(boolean,boolean,boolean) - omit opening paren when includeScheme is false(); new convenience method getSURTString(boolean includeScheme) More... about 11 years ago
remove archive-surt dependency More... about 11 years ago
update guava library to latest 14.0.1 More... about 11 years ago
remove unneeded(?) obsolete(?) archive-surt More... about 11 years ago
Rename GoogleURLCanonicalizer* to BasicURLCanonicalizer* and DefaultIA*Canonicalizer* to AggressiveIA*Canonicalizer* to better reflect their roles, deprecating the old class names. Elaborate on javadoc for BasicURLCanonicalizer. Remove scheme-lowercasing from BasicURLCanonicalizer. Add rule to IAURLCanonicalizer to support scheme-lowercasing, and add the rule to AggressiveIACanonicalizerRules. Add new OrdinaryIAURLCanonicalizer for non-aggressive canonicalization and a few tests in OrdinaryIAURLCanonicalizerTest. More... about 11 years ago
add other unicode line terminators to STRAY_SPACING regex More... about 11 years ago
Merge github.com:internetarchive/archive-commons More... about 11 years ago
archive-commons: Fix bug in SummaryBlockIterator that would reinit block and over again without use! Not actually leaking, but inefficient nevertheless! More... about 11 years ago
treat pct-encoded strings as encoded utf-8 bytes; encode unicode as pct-encoded utf-8; when decoding pct-encoded, if not valid utf-8, leave undecoded More... about 11 years ago
avoid NPE when url scheme is null, such as with "opaque" dns urls More... about 11 years ago
handle urls with uppercase letters in scheme More... about 11 years ago
archive-commons: Abstracted out HTTPSeekableLineReaders into different possible implementations, currently supporting Apache 3.1 and Java URLConnection.. possible to add (HttpClient 4.x) as well. HTTPSeekableLineReader.getHttpFactory() returns the actual instance, default is HttpClient 3.1 as before. More... about 11 years ago
HTTPSeekableLineReader: log connection pool use if FINER logging setting set, also set timeout for manager getting new connections DateFilter: Support an empty filter (accept all) CDXMapper: Fix url->surt cdx conversion for cdxs that have hostname as 3rd field, if so, treat http:// + cdx key as original url More... about 11 years ago
ack! accidentally not setting maxTotalConnections on apache!! Huge fix More... about 11 years ago
slr: more fixes to slr classes, null out streams on close More... about 11 years ago
zip blockloading: various fixes, support for turning off stale checking, nio stream improvements More... about 11 years ago
turn off mmap for NIO for now, minor fixes to line readers, null out raf More... about 11 years ago
archive-commons: refactored some archive-commons, blockloader has ThreadLocal storage of all readers, closed at end by wayback More... about 11 years ago
fix typo More... about 11 years ago
additional exception capturing leak detection, close SLR when ioexception occurs, then rethrow More... about 11 years ago
more exception handling in HTTPSeekableLineReader More... about 11 years ago
add get header function More... about 11 years ago
bufferFully support in SeekableLineReader More... about 11 years ago
add buffering support to all SeekableLineReaders More... about 11 years ago
fixes: add catch around multi-iterators so that errors in one don't necessarily disable the whole iterator More... about 11 years ago
slr: add custom input stream to httpseekablelinereader to abort on close if not fully read More... about 11 years ago
archive-commons: Some refactoring of the SeekableLineReader classes (to be renamed) to support generic reading from inputstream, moved common classes to base When using line buffering iterator, buffer on load More... about 11 years ago