Activity Not Available

News

 
Posted 4 months ago by Jan
Recently when indexing I got this message: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in D:\xampp\htdocs\xampp\sphider\admin\spiderfuncs.php on line 611
and the same for line 612.
As I am not so good a ... [More] programmer my question is how to solve this problem. The indexing is done, but I prefer to solve this.
All the best,
Jan [Less]
Posted 4 months ago by OopsHacked
Now we need to submit links with "Sphider", how you can make it moore simmilar to "Google"?
"Automatic indexing"
Posted 4 months ago by elsiebuck
Can Sphider index local files and documents?
I haven't "discovered" it on my own, so, I assume by default, it does not...
Posted 4 months ago by m-pcwebdevelopment
I have an existing site with a database consisting of one table. Can I use this same database for Sphider or need I create a new one. If I can use the same what is the risk of Sphider over writing the data I have today
Posted 5 months ago by Kevin
Please Help Me! What should I do?

= 2) {
$command_line = 1;
$ac = 1; //argument counter
while ($ac < (count($_SERVER['argv']))) {
$arg = $_SERVER['argv'][$ac];

if ($arg == '-all') {
$all = 1;
break;
} ... [More] else if ($arg == '-u') {
$url = $_SERVER['argv'][$ac+1];
$ac= $ac+2;
} else if ($arg == '-f') {
$soption = 'full';
$ac++;
} else if ($arg == '-d') {
$soption = 'level';
$maxlevel = $_SERVER['argv'][$ac+1];;
$ac= $ac+2;
} else if ($arg == '-l') {
$domaincb = 1;
$ac++;
} else if ($arg == '-r') {
$reindex = 1;
$ac++;
} else if ($arg == '-m') {
$in = str_replace("\\n", chr(10), $_SERVER['argv'][$ac+1]);
$ac= $ac+2;
} else if ($arg == '-n') {
$out = str_replace("\\n", chr(10), $_SERVER['argv'][$ac+1]);
$ac= $ac+2;
} else {
commandline_help();
die();
}

}
}

if (isset($soption) && $soption == 'full') {
$maxlevel = -1;

}

if (!isset($domaincb)) {
$domaincb = 0;

}

if(!isset($reindex)) {
$reindex=0;
}

if(!isset($maxlevel)) {
$maxlevel=0;
}

if ($keep_log) {
if ($log_format=="html") {
$log_file = $log_dir."/".Date("ymdHi").".html";
} else {
$log_file = $log_dir."/".Date("ymdHi").".log";
}

if (!$log_handle = fopen($log_file, 'w')) {
die ("Logging option is set, but cannot open file for logging.");
}
}

if ($all == 1) {
index_all();
} else {

if ($reindex == 1 && $command_line == 1) {
$result=mysql_query("select url, spider_depth, required, disallowed, can_leave_domain from ".$mysql_table_prefix."sites where url='$url'");
echo mysql_error();
if($row=mysql_fetch_row($result)) {
$url = $row[0];
$maxlevel = $row[1];
$in= $row[2];
$out = $row[3];
$domaincb = $row[4];
if ($domaincb=='') {
$domaincb=0;
}
if ($maxlevel == -1) {
$soption = 'full';
} else {
$soption = 'level';
}
}

}
if (!isset($in)) {
$in = "";
}
if (!isset($out)) {
$out = "";
}

index_site($url, $reindex, $maxlevel, $soption, $in, $out, $domaincb);

}

$tmp_urls = Array();

function microtime_float(){
list($usec, $sec) = explode(" ", microtime());
return ((float)$usec + (float)$sec);
}

function index_url($url, $level, $site_id, $md5sum, $domain, $indexdate, $sessid, $can_leave_domain, $reindex) {
global $entities, $min_delay;
global $command_line;
global $min_words_per_page;
global $supdomain;
global $mysql_table_prefix, $user_agent, $tmp_urls, $delay_time, $domain_arr;
$needsReindex = 1;
$deletable = 0;

$url_status = url_status($url);
$thislevel = $level - 1;

if (strstr($url_status['state'], "Relocation")) {
$url = preg_replace("/ /", "", url_purify($url_status['path'], $url, $can_leave_domain));

if ($url <> '') {
$result = mysql_query("select link from ".$mysql_table_prefix."temp where link='$url' && id = '$sessid'");
echo mysql_error();
$rows = mysql_numrows($result);
if ($rows == 0) {
mysql_query ("insert into ".$mysql_table_prefix."temp (link, level, id) values ('$url', '$level', '$sessid')");
echo mysql_error();
}
}

$url_status['state'] == "redirected";
}

/*
if ($indexdate <> '' && $url_status['date'] <> '') {
if ($indexdate > $url_status['date']) {
$url_status['state'] = "Date checked. Page contents not changed";
$needsReindex = 0;
}
}*/
ini_set("user_agent", $user_agent);
if ($url_status['state'] == 'ok') {
$OKtoIndex = 1;
$file_read_error = 0;

if (time() - $delay_time < $min_delay) {
sleep ($min_delay- (time() - $delay_time));
}
$delay_time = time();
if (!fst_lt_snd(phpversion(), "4.3.0")) {
$file = file_get_contents($url);
if ($file === FALSE) {
$file_read_error = 1;
}
} else {
$fl = @fopen($url, "r");
if ($fl) {
while ($buffer = @fgets($fl, 4096)) {
$file .= $buffer;
}
} else {
$file_read_error = 1;
}

fclose ($fl);
}
if ($file_read_error) {
$contents = getFileContents($url);
$file = $contents['file'];
}

$pageSize = number_format(strlen($file)/1024, 2, ".", "");
printPageSizeReport($pageSize);

if ($url_status['content'] != 'text') {
$file = extract_text($file, $url_status['content']);
}

printStandardReport('starting', $command_line);

$newmd5sum = md5($file);

if ($md5sum == $newmd5sum) {
printStandardReport('md5notChanged',$command_line);
$OKtoIndex = 0;
} else if (isDuplicateMD5($newmd5sum)) {
$OKtoIndex = 0;
printStandardReport('duplicate',$command_line);
}

if (($md5sum != $newmd5sum || $reindex ==1) && $OKtoIndex == 1) {
$urlparts = parse_url($url);
$newdomain = $urlparts['host'];
$type = 0;

/* if ($newdomain <> $domain)
$domainChanged = 1;

if ($domaincb==1) {
$start = strlen($newdomain) - strlen($supdomain);
if (substr($newdomain, $start) == $supdomain) {
$domainChanged = 0;
}
}*/

// remove link to css file
//get all links from file
$data = clean_file($file, $url, $url_status['content']);

if ($data['noindex'] == 1) {
$OKtoIndex = 0;
$deletable = 1;
printStandardReport('metaNoindex',$command_line);
}

$wordarray = unique_array(explode(" ", $data['content']));

if ($data['nofollow'] != 1) {
$links = get_links($file, $url, $can_leave_domain, $data['base']);
$links = distinct_array($links);
$all_links = count($links);
$numoflinks = 0;
//if there are any, add to the temp table, but only if there isnt such url already
if (is_array($links)) {
reset ($links);

while ($thislink = each($links)) {
if ($tmp_urls[$thislink[1]] != 1) {
$tmp_urls[$thislink[1]] = 1;
$numoflinks++;
mysql_query ("insert into ".$mysql_table_prefix."temp (link, level, id) values ('$thislink[1]', '$level', '$sessid')");
echo mysql_error();
}
}
}
} else {
printStandardReport('noFollow',$command_line);
}

if ($OKtoIndex == 1) {

$title = $data['title'];
$host = $data['host'];
$path = $data['path'];
$fulltxt = $data['fulltext'];
$desc = substr($data['description'], 0,254);
$url_parts = parse_url($url);
$domain_for_db = $url_parts['host'];

if (isset($domain_arr[$domain_for_db])) {
$dom_id = $domain_arr[$domain_for_db];
} else {
mysql_query("insert into ".$mysql_table_prefix."domains (domain) values ('$domain_for_db')");
$dom_id = mysql_insert_id();
$domain_arr[$domain_for_db] = $dom_id;
}

$wordarray = calc_weights ($wordarray, $title, $host, $path, $data['keywords']);

//if there are words to index, add the link to the database, get its id, and add the word + their relation
if (is_array($wordarray) && count($wordarray) > $min_words_per_page) {
if ($md5sum == '') {
mysql_query ("insert into ".$mysql_table_prefix."links (site_id, url, title, description, fulltxt, indexdate, size, md5sum, level) values ('$site_id', '$url', '$title', '$desc', '$fulltxt', curdate(), '$pageSize', '$newmd5sum', $thislevel)");
echo mysql_error();
$result = mysql_query("select link_id from ".$mysql_table_prefix."links where url='$url'");
echo mysql_error();
$row = mysql_fetch_row($result);
$link_id = $row[0];

save_keywords($wordarray, $link_id, $dom_id);

printStandardReport('indexed', $command_line);
}else if (($md5sum <> '') && ($md5sum <> $newmd5sum)) { //if page has changed, start updating

$result = mysql_query("select link_id from ".$mysql_table_prefix."links where url='$url'");
echo mysql_error();
$row = mysql_fetch_row($result);
$link_id = $row[0];
for ($i=0;$i< count($links)) {
$num++;
$thislink = $links[$count];
$urlparts = parse_url($thislink);
reset ($omit);
$forbidden = 0;
foreach ($omit as $omiturl) {
$omiturl = trim($omiturl);

$omiturl_parts = parse_url($omiturl);
if ($omiturl_parts['scheme'] == '') {
$check_omit = $urlparts['host'] . $omiturl;
} else {
$check_omit = $omiturl;
}

if (strpos($thislink, $check_omit)) {
printRobotsReport($num, $thislink, $command_line);
check_for_removal($thislink);
$forbidden = 1;
break;
}
}

if (!check_include($thislink, $url_inc, $url_not_inc )) {
printUrlStringReport($num, $thislink, $command_line);
check_for_removal($thislink);
$forbidden = 1;
}

if ($forbidden == 0) {
printRetrieving($num, $thislink, $command_line);
$query = "select md5sum, indexdate from ".$mysql_table_prefix."links where url='$thislink'";
$result = mysql_query($query);
echo mysql_error();
$rows = mysql_num_rows($result);
if ($rows == 0) {
index_url($thislink, $level+1, $site_id, '', $domain, '', $sessid, $can_leave_domain, $reindex);

mysql_query("update ".$mysql_table_prefix."pending set level = $level, count=$count, num=$num where site_id=$site_id");
echo mysql_error();
}else if ($rows <> 0 && $reindex == 1) {
$row = mysql_fetch_array($result);
$md5sum = $row['md5sum'];
$indexdate = $row['indexdate'];
index_url($thislink, $level+1, $site_id, $md5sum, $domain, $indexdate, $sessid, $can_leave_domain, $reindex);
mysql_query("update ".$mysql_table_prefix."pending set level = $level, count=$count, num=$num where site_id=$site_id");
echo mysql_error();
}else {
printStandardReport('inDatabase',$command_line);
}

}
$count++;
}
$level++;
}

mysql_query ("delete from ".$mysql_table_prefix."temp where id = '$sessid'");
echo mysql_error();
mysql_query ("delete from ".$mysql_table_prefix."pending where site_id = '$site_id'");
echo mysql_error();
printStandardReport('completed',$command_line);

}

function index_all() {
global $mysql_table_prefix;
$result=mysql_query("select url, spider_depth, required, disallowed, can_leave_domain from ".$mysql_table_prefix."sites");
echo mysql_error();
while ($row=mysql_fetch_row($result)) {
$url = $row[0];
$depth = $row[1];
$include = $row[2];
$not_include = $row[3];
$can_leave_domain = $row[4];
if ($can_leave_domain=='') {
$can_leave_domain=0;
}
if ($depth == -1) {
$soption = 'full';
} else {
$soption = 'level';
}
index_site($url, 1, $depth, $soption, $include, $not_include, $can_leave_domain);
}
}

function get_temp_urls ($sessid) {
global $mysql_table_prefix;
$result = mysql_query("select link from ".$mysql_table_prefix."temp where id='$sessid'");
echo mysql_error();
$tmp_urls = Array();
while ($row=mysql_fetch_row($result)) {
$tmp_urls[$row[0]] = 1;
}
return $tmp_urls;

}

function get_domains () {
global $mysql_table_prefix;
$result = mysql_query("select domain_id, domain from ".$mysql_table_prefix."domains");
echo mysql_error();
$domains = Array();
while ($row=mysql_fetch_row($result)) {
$domains[$row[1]] = $row[0];
}
return $domains;

}

function commandline_help() {
print "Usage: php spider.php \n\n";
print "Options:\n";
print " -all\t\t Reindex everything in the database\n";
print " -u \t Set url to index\n";
print " -f\t\t Set indexing depth to full (unlimited depth)\n";
print " -d \t Set indexing depth to \n";
print " -l\t\t Allow spider to leave the initial domain\n";
print " -r\t\t Set spider to reindex a site\n";
print " -m \t Set the string(s) that an url must include (use \\n as a delimiter between multiple strings)\n";
print " -n \t Set the string(s) that an url must not include (use \\n as a delimiter between multiple strings)\n";
}

printStandardReport('quit',$command_line);
if ($email_log) {
$indexed = ($all==1) ? 'ALL' : $url;
$log_report = "";
if ($log_handle) {
$log_report = "Log saved into $log_file";
}
mail($admin_email, "Sphider indexing report", "Sphider has finished indexing $indexed at ".date("y-m-d H:i:s").". ".$log_report);
}
if ( $log_handle) {
fclose($log_handle);
}

?> [Less]
Posted 5 months ago by ukamalo
hi
when I preview in browser my search engine results page (.sphider.) I get a message saying that:

Deprecated: mysql_pconnect(): The mysql extension is deprecated and will be removed in the future: use mysqli or PDO instead in ... [More] /home/whatsup/public_html/sphider-1.3.5/settings/database.php on line 10

my detabase.php (sphider-1.3.5)

I have no idea what I need to change..? and I don't know how to change..
Could you please help me solve this problem?

Best regards
kami [Less]
Posted 5 months ago by soaringeagle
i have a large social network 1.5 million pages in the current sitemap im moving to a diferent system
i need a search engine that can index the entire site including media like vids mp3's photos etc
embeded and uploaded
plus it need to ... [More] also search sections like blogs forums profiles videos photos etc and return results just from those sections
and be able to update daily by cron job
or better yet incremnentaly every hour.. tracking only new or changed items rather then reindex the entire site

i need a solution that will do what i need asap

also the results need to look integrated into the site..preferably if possible returning the results on the page or within the module being searched [Less]
Posted 5 months ago by jhay
When trying search keyword there showing Table 'sphider.query_log' doesn't exist
,How to fix this
Posted 5 months ago by kreator
Hi fellas,

My site www gamblesearch com do not receiving any results. I have just one site indexed, with: Currently in database: 1 sites, 23 links, 0 categories and 836 keywords.

So, can anyone suggest something helpful? I'm web ... [More] developer and i will understand everything you tell me.

I've fixed two errors on the site: missing .link_keyword table and .temp (so i've imported that two tables from the tables.sql that i got from latest version of fresh sphider).

Note: This site was built since 7-8 years ago and it was good until few years ago. Suddenly broken!

Is there anyone who know what can cause this problem? [Less]
Posted 5 months ago by searchse
Hi,

I like to know what is server requirements and features.
Can i use Sphider script to make big search engine with 100K to 100 Million sites / pages?

Requirements:
1) How small server it should need to index 100,000 ... [More] websites?
2) How big server it should need to index 100 million pages?
3) What is PHP and MySQL requirements?
4) Any server requirement that we did not asked?

Features:
5) Can we use Google Adsense in top an sidebar of search result?
6) How can we earn income by starting search engine using Sphider script ?
7) Do you offer any plugins or mods that works like advertiser (adword) and publisher (adsense) or only for advertisers to advertise?
8) Can you add any features if we request?
9) How often do you update or add more features?
10) Can i make search engine as multilingual ?
11) Is search result same for every country or it provides result based on country or IP ?
12) Can we block some keyword with zero result ?

I am willing to buy but need to know more. Thanks [Less]