[Skip to content]  [Text only]  [Accessibility]  [Sitemap]

SF Logo (home page link) phpSiteFramework

You are:  Home :: Swish-e PHP Tools :: About



Creative Commons License

CC-GNU GPL

[Print]

About

Swish-e PHP  Tools are still in development - call it alpha phase.

Rationale

Swish-e PHP Tools are being created to fill a specific need at the Fitzwilliam Museum - the basic need being the desire to develop our own 'on-server' search engine (both for our intranet and internet sites). After some research we decided to try to use Swish-e as our indexing and search engine. We choose to stick with a single web development language, PHP , where possible.

Swish-e is very versatile but the majority of the tools and utilities surrounding it are Perl based (which is fine).
Decision - use Swish-e and built some PHP-based tools to complement it - with these factors in mind we felt we needed:

  • a PHP based http 'spider' which was capable of crawling our content via http and 'feeding' that content to Swish-e for indexing via its '-S prog' option (this has become SPT_spider.php), and
  • a PHP module to assist in deployment of search form functionality over the Swish-e search engine (this hasn't got very far yet)

SPT_spider.php

This command line PHP tool is designed to crawl an entire http structure from a starting page and is capable of being used as a source to the swish-e.exe indexer. It provides (internal) configuration options to include/exclude urls, allowed file types, disallowed url types etc

Usage

1) It can just be run from command line, do the spidering and generate a lists of valid and invalid spider urls (example output).

2) It can be run as the source to swish-e.exe indexer. In this case it is named using the swish-e -S prog option and configuration file, e.g.

swish-e -S prog -c example.cfg

swish-e then indexes contents supplied by SPT_spider.php spidering through valid URLs (based on SPT_spider settings).

 

 



The Fitzwilliam Museum, University of Cambridge.

phpSiteFramework powered