Skip to main content

Expands FeedItems with the body of items referred to

Project description

Feedfiller
==========

.. contents::

What is it?
===========

*Feedfiller* is intended to work alongside Zest Software's FeedFeeder_
package,
.. _FeedFeeder: http://plone.org/products/feedfeeder
and provide the additional functionality of filling all the news feed items
with the clean body content of the page they refer to. Feedfiller can be
educated about structure of content to help access the most interesting parts.
If does not yet 'know' the structure of the target page, all it can do is
include the whole page. We will improve this as the project develops.

Clearly there are potential copyright issues with the re-publishing of
copyrighted works. But for research and analysis purposes, these may not
be an issue for your organization. Our own purpose is to use collected
text for classification and analysis for internal use. You should
seek your own legal advice on this topic.

Dependencies
============

BeautifulSoup, Products.feedfeeder. If you use the egg package,
these dependencies will be managed for you.

How does it work?
=================

Feedfiller subscribes to the event created after storage of each news feed
item created by FeedFeeder and fetches the target page of that item. This
means that all items will be be filled with the content of the page they refer
to. Fetched pages are flayed ("Flay: Verb: to strip off the skin or surface
of") by a Flayer looked up in a FlayerRegistry by URL.

Flayers may be easily written to accomodate new pages. Flayers can be
created and registered for different sections of a site, in case HTML
structure varies in sub-trees of the site.

If no flayer is registered for the URL, a default flayer is used that
returns the whole body of the page.

Currently site-specific flayers try to reveal author, copyright, and body,
but the default flayer

The flayer base-class currently stores the original page fetched from
the server, to facilitate further development and refinemement of flayers
without repeatedly fetching content.

TODO
====

The next step is to develop a table-driven flayer, for which table entries
can be generated interactively by clicking on an enhanced version of the
default flay, a bit like a basic firebug view of the structure of a page
with buttons to manually select the body area of a page. This will
rmoyrequire a new view for this purpose, available to managers.

There is no reason why the table-drive flayer should not be able to handle the
complexity of the BBC news page, leaving only the trickiest pages to the
custom class approach used currently.

Table items should eventually be replicated across all other feedfiller users,
perhaps using bi-directional rsync using a central repository, or perhaps
using svn.

CREDITS
=======

The project was initiated by Russ Ferriday, Topia Systems Ltd, in
November, 2008.

Thanks to 'Business Across Borders'__ for sponsorship of this work.

__BusinessAcrossBorders: http://businessacrossborders.com/

Thanks to Zest Software and the van Rees brothers for FeedFeeder.


Contributions are welcome, and contributors are listed below:









Changelog
=========

0.1 - Unreleased
----------------

* Initial release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

collective.feedfiller-0.1dev-r77076.tar.gz (50.4 kB view details)

Uploaded Source

File details

Details for the file collective.feedfiller-0.1dev-r77076.tar.gz.

File metadata

File hashes

Hashes for collective.feedfiller-0.1dev-r77076.tar.gz
Algorithm Hash digest
SHA256 b424a7c5d4b2d56006db18accd70a517e8b09dbd59b9189a36b7bcbbb8f5fe3b
MD5 26dd5b822c9da0a528466ee261fb161a
BLAKE2b-256 d89a90538b4b2d6b8a269638999ca51fc9dc56c1a2412a7b28fcb6e894a8c054

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page