New AlchemyAPI Release: ‘Visual’ Web Content Mining (Structured Data!)
Posted by: admin on September 10th, 2009
We’re announcing another significant update to the AlchemyAPI content analysis service: Visual Constraints Web Content Mining
This is an entirely new AlchemyAPI capability that enables extraction of structured data (product information, pricing, descriptions, etc.) from any web page. Visual constraints enable content extraction using simple ‘natural language’ queries, such as: “all links after product details”
Pictures speak louder than words, so here are some query examples:



AlchemyAPI’s visual constraint query engine is a powerful tool for extracting structured data from any web page. Constraints enable content to be identified using visual characteristics such as text labels & patterns, positioning within a web page, structural encapsulation, and more. Mining structured data via visual constraints is robust against changes in underlying HTML document / tag structure, CSS, etc.
Something else we’re really excited about: Visual constraints are fully integrated into AlchemyAPI’s other content analysis capabilities, enabling the targeted execution of named entity recognition, text categorization, language detection, or other NLP tasks on specific portions of a web page. AlchemyAPI is unique in the industry with this capability to perform highly-targeted NLP operations on web pages.
AlchemyAPI also now fully supports XPath, for the W3C / XSLT fans out there.
Here’s an example of targeted named entity extraction operations:

We’ll be exploring more in coming weeks regarding using AlchemyAPI’s visual constraints engine to perform targeted named entity & keyword extraction, topic categorization, language detection operations, and more.
Entry Filed under: AlchemyAPI, Company, NLP, Releases

Leave a Comment
Trackback this post | Subscribe to the comments via RSS Feed