EpiSPIDER now uses the OpenCalais natural language processing (NLP) web service from Thomson-Reuters to annotate specific entities (medical condition and location) found in news reports. Named entity recognition and coreference resolution are classical NLP challenges. OpenCalais exposes a NLP application programming interface (API) to leverage algorithms that perform named entity recognition and coreference resolution in free text from news sources.

Recognition of location and medical condition entities are two areas of interest for EpiSPIDER and “outsourcing” the ability to extract critical data from unstructured information leverages the emerging, bottom-up, service-oriented architecture on the web.

Although OpenCalais performs on almost any text thrown at it, the following pitfalls were observed for named entity recognition:

  1. Using OpenCalais (OC), EpiSPIDER extracted the location entity Buffalo, Indiana, US from the ProMED Mail report with the title “PRO/AH/EDR> Undiagnosed deaths, buffalo – India (Orissa): RFI”.
  2. OC did not “disambiguate” between Atlanta, New York, United States and Atlanta,GA, United States.
  3. OC georeferencing assigned Norway with lat/long values that are locaed in the Russian heartland.
  4. For some unknown reason, it identified West Virginia as part of South Korea.

In spite of these rare glitches, the OpenCalais web service heralds exciting times as we see computing and network power bring down the barriers to an emerging service-oriented architecture out there.

Trackback

4 comments untill now

  1. Hi,

    Thanks for mentioning OpenCalais in your post.
    We’d like to look into the geo-location errors. Would you be able to send us some samples (articles/texts) that triggered the issues described above?
    In general, feel free to send us your feedback any time to questions@opencalais.com.

    Michal

  2. This is very impressive. It’s often a challenge to communicate the power and potential of Open Calais to folks who are not hip-deep in the Semantic Web waters, or the applications can seem arcane and distant from real-world concerns. Not so, here. The EpiSpider application is quite nice. Thanks for your efforts.
    Regards,
    Fran Sansalone
    Community Manager, Open Calais

  3. Michal,

    I should be able to dig into the database to look for the geolocation errors. Although they may be significant, they are relatively very, very sparse compared to the impressive geolocation performance of OpenCalais.

    Thanks for making the OpenCalais web service available to the public health community – it certainly contributes to making this world a safer place to live in.

    - Herman

  4. Great discussion folks and great post as usual. While being informaticians we focus on what technologies could be of help, we should also perhaps start thinking in terms of how can event based surveillance assist other traditional methods and fill-in the gaps and be of help.

Add your comment now

Get Adobe Flash playerPlugin by wpburn.com wordpress themes