Posts filed under 'NLP'


New AlchemyAPI Release: Keyword Relevancy Scores

Posted by: admin on March 30th, 2010

Today we’re announcing a new AlchemyAPI release, containing significant enhancements to our Keyword Extraction API.

A new “GetRankedKeywords” API is now available, exposing relevance scores for extracted keywords.  These scores represent the overall importance of a given keyword to a document.

AlchemyAPI’s Keyword Extraction API also contains a number of under-the-hood enhancements, which result in even better & more relevant keywords for your content.

We’ve released updates to the AlchemyAPI SDKs as well, exposing our new Keyword API for quick integration into your application.

Stay tuned for more updates & AlchemyAPI enhancements.

PS. — Do you love NLP, text analytics, and the semantic web as much as we do?  Join our team!  AlchemyAPI is now hiring for a number of positions, including linguistic annotators, QA engineers, and core NLP developers.

Add comment

New AlchemyAPI Release: Quotations Extraction & Coreferences

Posted by: admin on October 27th, 2009

Today we’re announcing a second AlchemyAPI release for the month of October.  This update
includes several new features & enhancements:

Quotations Extraction - AlchemyAPI now identifies quotations in any unstructured text, such as newswire  articles or blog postings.  Using quotations extraction, AlchemyAPI exposes both what is being said, and  who is saying it.

Coreference Resolution - AlchemyAPI now resolves he/she/his/her/etc coreferences into named entities, providing a more comprehensive view of processed texts.

This latest AlchemyAPI release also contains a number of under-the-hood enhancements to Terminology Extraction and other APIs.  New functionality is available effective immediately to all existing AlchemyAPI users.

3 comments

New AlchemyAPI Release: Increased Precision, Disambiguation, and Web Page cleaning Updates

Posted by: admin on October 5th, 2009

Another AlchemyAPI release is upon us.  This is a maintenance release contains a variety
of enhancements:

Increased Precision - Named Entity Extraction now features increased precision for all English-language content.  This means fewer false positives & more accurate results.  Recall has also been increased, meaning you’ll get more named entities when submitting content.  These updates will roll out to our other supported languages over the next two weeks.

Disambiguation - Named entity disambiguation coverage has been greatly expanded; AlchemyAPI’s disambiguation database has more than doubled in size, providing much greater coverage for non-USA locations, persons, organizations, and more.  We’ve also increased disambiguation accuracy, meaning more accurate results when processing ambiguous texts.

Web Page Cleaning - AlchemyAPI’s text extraction / web page cleaning APIs have been updated; Text extraction now operates with increased precision, especially for Blogs and other non-News content types.

AlchemyAPI is among the most accurate and highest performance content analysis APIs in the industry.  This release is part of our continued commitment to advancing text analysis precision, recall, and performance. We have more exciting AlchemyAPI news & feature enhancements planned for the month of October.  Stay tuned!

Add comment

New AlchemyAPI Release: ‘Visual’ Web Content Mining (Structured Data!)

Posted by: admin on September 10th, 2009

We’re announcing another significant update to the AlchemyAPI content analysis service: Visual Constraints Web Content Mining

This is an entirely new AlchemyAPI capability that enables extraction of structured data (product information, pricing, descriptions, etc.) from any web page.  Visual constraints enable content extraction using simple ‘natural language’ queries, such as: “all links after product details”

Pictures speak louder than words, so here are some query examples:

AlchemyAPI’s visual constraint query engine is a powerful tool for extracting structured data from any web page. Constraints enable content to be identified using visual characteristics such as text labels & patterns, positioning within a web page, structural encapsulation, and more. Mining structured data via visual constraints is robust against changes in underlying HTML document / tag structure, CSS, etc.

Something else we’re really excited about: Visual constraints are fully integrated into AlchemyAPI’s other content analysis capabilities, enabling the targeted execution of named entity recognition, text categorization, language detection, or other NLP tasks on specific portions of a web page.  AlchemyAPI is unique in the industry with this capability to perform highly-targeted NLP operations on web pages.
AlchemyAPI also now fully supports XPath, for the W3C / XSLT fans out there.

Here’s an example of targeted named entity extraction operations:

We’ll be exploring more in coming weeks regarding using AlchemyAPI’s visual constraints engine to perform targeted named entity & keyword extraction, topic categorization, language detection operations, and more.

Add comment

New AlchemyAPI Release: Relevancy Ranking & Increased Precision

Posted by: admin on August 20th, 2009

AlchemyAPI has been growing at an amazing pace over the past year since our first public release.  We’re processing a massive number of API calls each day, for a variety of customers in multiple industry verticals.

We love engaging with our customers and user community, gathering feedback regarding our service and suggestions for improvement.  This community feedback has a direct impact on our product planning and the general direction of AlchemyAPI.

This new AlchemyAPI release brings a new feature requested by a number of you in our community:

Relevancy Ranking

Relevancy ranking expands upon AlchemyAPI’s sophisticated named entity extraction capability, applying a numeric ‘relevancy score’ to every item we detect.  These scores convey the importance of a given entity (Person, Company, etc.) to the document being processed as a whole.

Relevancy ranking enables one to easily sift through the named entities within a given news article or other piece of content, identifying what’s important and what isn’t.

Our relevancy functionality employs a sophisticated statistical ranking algorithm, employing over two-dozen different signals & cues, as well as advanced probability modeling. It provides far superior results to frequency/count-based relevancy ranking approaches.

This AlchemyAPI release also offers increased named entity extraction precision. This means fewer false positives and better extraction results.  Give our system a whirl and you’ll find it’s among the most accurate in the industry!

Stay tuned for more updates!

Add comment

New AlchemyAPI Release: Identify content in 97 languages!

Posted by: admin on August 6th, 2009

Another AlchemyAPI release is upon us, providing a significant update to our “automatic language identification” capability, new performance and character encoding enhancements, and more!

AlchemyAPI utilizes statistical and lexical techniques to automatically determine the language of any content it processes.  Our natural language processing core leverages this information to provide increased accuracy when categorizing text, extracting keywords and named entities, and so on.

Our new automatic language identification capability represents a huge leap in functionality; AlchemyAPI is now capable of identifying content written in 97 different languages!  This includes nearly all of the world’s major languages, in addition to uncommon / regional dialects such as Cherokee, Dakota, and Macedonian, etc.

AlchemyAPI’s language detector is extremely robust, capable of generating a match from only a few words of text.  It operates at a high level of precision, and is extremely fast.  We detect more languages, with better accuracy, than any other language detection service in the industry today.

An interactive demo of our language detector is available here.

Our language detection API now also returns additional data: ISO-639 language codes, links to language information on Ethnologue and Wikipedia, the number of native speakers for each language, and more!

If you are currently doing language / information-extraction research, are a linguist, or are working with a language that we do not currently support, we’d love to hear from you.

Add comment

New AlchemyAPI Release: ‘Concept’ Tagging / Phrase Extraction, in 8 languages!

Posted by: admin on July 23rd, 2009

We’re back this month with another big AlchemyAPI service update!  This release includes a number of under-the-hood enhancements that further enhance the performance and usability of AlchemyAPI.  Also now available, a significant new text analysis capability:

Automated ‘Concept’ Tagging / Phrase Extraction

Concept tagging is a text analysis technology that works in conjunction AlchemyAPI’s Named Entity Recognition (NER) capability, to discover tags, phrases, and specific terminology that relate to the “about-ness” of a piece of content.

This new tagging capability is the result of months of behind-the-scenes engineering effort, and employs some relatively sophisticated statistical analysis and language modeling techniques.

Our new tagging system also works in 8 different languages, more languages than any other automated tagging service, commercial or otherwise.  Feel free to push English, French, German, Russian, Italian, Spanish, Portuguese, or Swedish content through the system.

So what kind of tags can this system extract?  Here’s an example:

Article: “NASA celebrates Chandra X-Ray Observatory’s 10th anniversary

Extracted Tags / Phrases: chandra x-ray image, chandra data, chandra project, hubble space telescope, science mission directorate, nasa headquarters, space shuttle columbia, dark matter, …

It’s worth noting that our Concept Tagging system is robust when processing specialized content (such as scientific publications) as well as more general content (news & blogs).

To try an interactive demo of Concept Tagging, click here.

1 comment

New Release, New Tools, Open Registration

Posted by: admin on May 5th, 2009

A new release of AlchemyAPI is upon us!

Notable additions in this release include: updates to the AlchemyAPI programmer SDKs, secure SSL access to AlchemyAPI for subscription users, content mining / Named Entity Extraction from photographs of printed documents (OCR+Entity Extraction), and more.

We’ve also opened up AlchemyAPI registration to the general public: Register today

This release brings with it several new tools built around the AlchemyAPI content analysis services.  These include:

AlchemyTagger - Semantic-powered Tag Suggestions for WordPress Blogs

AlchemyTagger automatically works in the background as you’re blogging, analyzing your writing and suggesting useful tags for your posts. Tags make your posts easier to navigate, better-ranked by search engines, and can increase flows of relevant website traffic.

AlchemySEO - Semantic-powered Search Engine Optimization

AlchemySEO detects when a search engine is accessing your website, returning a semantically-marked up version of your content. Specifically, AlchemySEO annotates your web page content with REL-TAG Microformats and HTML META “keyword” tags.  By exposing this semantic meta-data to search engines such as Google and Yahoo, AlchemySEO improves your search engine rankings and increases flows of relevant traffic.

Orchestr8 will be exhibiting at Gluecon next week, May 12-13.  If you’re attending and would like to learn more about AlchemyAPI, please stop by our booth!

Add comment

Orchestr8 presenting at BDNT on March 23rd

Posted by: eturner on March 20th, 2009

Orchestr8 will be demoing our AlchemyAPI service at the Boulder-Denver New Technology event on March 23rd.

Come and join us to find out more information on AlchemyAPI and its capabilities.

We have a suprise for this BDNT event: Something special and entirely new in the world of NLP text mining.  We’ve built a fun interactive demo to illustrate this new capability and will be giving everyone a chance to get “hands on” at the BDNT.

Add comment

Named Entity Disambiguation

Posted by: eturner on March 17th, 2009

We’re back with another big update to our AlchemyAPI content analysis / text mining service!

What’s new in this release?  Named Entity Disambiguation

Human language is not exact. Text referring to the city “Roanoke” can mean “Roanoke, Virginia” or “Roanoke, Texas“, depending on the surrounding context. Organizations and companies often have multiple nicknames, name variations, or common misspellings. Famous persons (”Michael Jackson”) often share a name with many non-famous individuals.

Named Entity Disambiguation works to solve these and other text ambiguity problems.

So how does it work?

Our disambiguation engine employs tens of millions of contextual hints describing traits of the world’s objects, individuals, and locations. We employ a variety of public and non-public data-sets.

Hints vary depending on the specific type of entity being disambiguated. For example, when disambiguating people, we utilize information on a person’s career, where they’re located, who they work for, and so on. For companies: key executives, notable products, industry, location, etc.

Whenever an entity is successfully disambiguated, additional information is returned in API responses. This includes the fully resolved, disambiguated entity name, and if available, the entity’s website and geographic coordinates.

AlchemyAPI’s Named Entity Disambiguation system resolves approximately two-dozen entity types, more than any other commercially-available text mining system!

Disambiguation functionality is available to all API preview / beta users.  If you do not currently have an API access key, please apply for one.

Also new in this release:

  1. Source text can now optionally be returned in all named entity and keyword extraction API call results.
  2. Updates to online API documentation.
  3. New developer SDKs for Ruby, C, and C++.

Add comment