Overview
revolution/laravel-fullfeed is a Laravel package that extracts the main content from web pages for feed readers.It uses site-specific JSON rules so you can reliably pull article content from different domains.
This package was extracted from a private feed reader app and released as a standalone package.
Requirements
- PHP >= 8.4 (uses
Dom\HTMLDocument) - Laravel >= 12.x
Installation
Publish config and site rule files
config/fullfeed.phpresources/fullfeed
Auto update site rules via composer post-update-cmd
To republish site rules automatically aftercomposer update, add this to composer.json:
Basic usage
Testing
UseFullFeed::expects() to fake facade behavior in tests.
Site rule files
items_all.json
items_all.json is based on the LDRFullFeed (wedata) rule format used widely for full-text extraction.Livedoor Reader, once a very popular feed reader service in Japan, has ended, but this rule data still exists and remains useful today.
plus.json
plus.json is a sample file for adding your own rules.
Rule fields
url: Regular expression for target URLsselector: CSS selector (takes priority overxpath)xpath: XPath expressionenc: Character encoding for non-UTF-8 pagescallable: Custom extractor class(es) executed before built-in extractionafter_callable: Custom extractor class(es) executed at the end
Extractor order as a Pipeline pattern example
FullFeed is a practical example of Laravel’s Pipeline pattern.Extractors run in this order:
- Classes in
callable XPathExtractorSelectorExtractor- Classes in
after_callable
Built-in extractors
RemoveElements
Removes elements matched by selectors.ReplaceMatches
Replaces text matched by regular expressions (processed as an HTML string).StripTags
Removes tags using behavior equivalent tostrip_tags().
Squish
Removes extra whitespace withStr::squish().
Adding custom rules
- Create a JSON file in
resources/fullfeed - Add that file to
pathsinconfig/fullfeed.php
data.url rule is used, so put custom files near the beginning of paths.