Swish-e is a fast, flexible, and free open source system for indexing collections of Web pages or other files. Swish-e is written in C but has APIs for Perl, PHP and other languages.
This page is about a Python API. Note that this is not the only such API, others can be found on the Python Package Index website.
Please note: this package is not very well tested and may be out of date (it was initially done for Swish-e 2.4). No support is provided.
Assuming you have a SWISH-E index file called 't/swish.idx', and you would like to search for occurences of 'madrid' in the index files, the following would do:
$ python
Python 2.2.3 (#1, Jul 15 2003, 15:44:20)
[GCC 2.95.3 20010125 (prerelease, propolice)] on openbsd3
Type "help", "copyright", "credits" or "license" for more information.
>>> # load the module
>>> import SwishE
>>> # get a SWISH-E handle on 't/swish.idx'
>>> handle = SwishE.new('t/swish.idx')
>>> # get a search object
>>> search = handle.search('')
>>> # search for 'madrid'
>>> results = search.execute('madrid')
>>> # tell the world how many results we have
>>> print results.hits()
>>> # iterate on the results
>>> for r in results:
... print r.getproperty('swishtitle')
...
Argentina Centro de Medios Independientes
Indymedia Barcelona: home
San Francisco Bay Area Independent Media Center
Independent Media Center -
>>> # now looking for 'lluita', we want to sort by title
>>> search.setSort('swishtitle')
>>> again = search.execute('lluita')
>>> for r in again:
... print r.getproperty('swishdocpath')
...
1.html
>>> # figure out that sorting isn't of much use with a single match
Jean-François Piéronne added a "query" method as of version 0.5 - it is possible to pass the SwishE.Handle object a search string directly by way of that method. For example:
>>> for r in SwishE.new('index.swish-e').query('tags'):
... print r.getproperty('swishtitle')
On SGML and HTML
HTML 4 Changes
Tables in HTML documents
HTML 4 Specification References
Conformance: requirements and recommendations
Performance, Implementation, and Design Notes
>>>