| Path: | README |
| Last Update: | Tue Aug 22 19:48:56 BST 2006 |
Ariel is a library that allows you to extract information from semi-structured documents (such as websites). It is different to existing tools because rather than expecting the developer to write rules to extract the desired information, Ariel will use a small number of labeled examples to generate and learn effective extraction rules. It is developed by Alex Bradbury and released under the MIT license. Ariel was started as a Google Summer of Code project mentored by Austin Ziegler in 2006.
gem install ariel
I’m happy to announce the release of Ariel 0.1.0, the result of my Summer of Code work. This release should be easy to use, very functional, and hopefully useful - so it’s worth trying out. I’ve put a lot of effort in to writing clear and straightforward documentation to get your started, so take a look at the docs available at ariel.rubyforge.org. In particular, flick through the tutorial and quick start guide. If you’re interested, you may also want to take a look at the theory page where I’ve made a good start on describing the method Ariel uses to learn extraction rules. If you have any problems or find any bugs, just send me an email or add it to the issue tracker (see link below). Enjoy. See the FAQ for a vim snippet to make labeling examples a little easier.
structure = Ariel::Node::Structure.new do |r|
r.item :title
r.item :body
r.list :comments do |c|
c.list_item :comment do |d|
d.item :author
d.item :body
end
end
end
unlabeled_file2
"Great stuff, loving it", "I love life", .....
returns the first result rather than an array of matches).
Ariel is developed by Alex Bradbury as a Google Summer of Code project under the mentoring of Austin Ziegler.
SVN Repository: rubyforge.org/projects/ariel Issue tracker: code.google.com/p/ariel/issues/ Documentation/homepage: ariel.rubyforge.org RDoc: ariel.rubyforge.org/rdoc/