I’m back!

Earlier I have described the OpenCalais Web Service.

The ecosystem of web services

The NASA Earth Observatory Glossary defines an ecosystem as “any natural unit or entity including living and non-living parts that interact to produce a stable system through cyclic exchange of materials” [NASA]. The concept can be applied to Internet-based applications that function as information-consuming or information producing “organisms” and that interact with each other in an interdependent way through exchange of information.

The IBM web site, on the other hand, defines “web services” as “self-contained, modular, distributed, dynamic applications that can be described, published, located, or invoked over the network to create products, processes, and supply chains.”

As discrete, possibly autonomous “organisms” in an Internet-based information ecosystem, web services-enabled applications expose data and/or service end points in multiple ways including Really Simple Syndication (RSS) feeds, and web services Application Programming Interfaces (APIs) using Simple Object Access Protocol (SOAP), XML Remote Procedure Call (XML-RPC) or REpresentational State Transfer (REST). Aside from the use of XML to embed data in responding to data or process requests, an increasing number of web service applications also provide responses using Javascript Object Notation (JSON). OpenCalais and Alchemy use Resource Description Framework (RDF), an XML-based semantic web format that structures data as triples (subject, predicate, object), to respond to API requests and both perform named entity disambiguation by linking to external knowledge bases (e.g., CIA Factbook, Wikipedia, Freebase). These web service applications may even provide machine learning-based services such as natural language processing (specifically named entity extraction and concept annotation), language detection and translation and text classification. Tools that enable semantic processing of content (not just classification) potentially allow exposing richer knowledge-based content embedded in unstructured data such as news about outbreaks and disasters.

Read the rest of this entry

EpiSPIDER now uses the OpenCalais natural language processing (NLP) web service from Thomson-Reuters to annotate specific entities (medical condition and location) found in news reports. Named entity recognition and coreference resolution are classical NLP challenges. OpenCalais exposes a NLP application programming interface (API) to leverage algorithms that perform named entity recognition and coreference resolution in free text from news sources.

Recognition of location and medical condition entities are two areas of interest for EpiSPIDER and “outsourcing” the ability to extract critical data from unstructured information leverages the emerging, bottom-up, service-oriented architecture on the web.

Although OpenCalais performs on almost any text thrown at it, the following pitfalls were observed for named entity recognition:

  1. Using OpenCalais (OC), EpiSPIDER extracted the location entity Buffalo, Indiana, US from the ProMED Mail report with the title “PRO/AH/EDR> Undiagnosed deaths, buffalo – India (Orissa): RFI”.
  2. OC did not “disambiguate” between Atlanta, New York, United States and Atlanta,GA, United States.
  3. OC georeferencing assigned Norway with lat/long values that are locaed in the Russian heartland.
  4. For some unknown reason, it identified West Virginia as part of South Korea.

In spite of these rare glitches, the OpenCalais web service heralds exciting times as we see computing and network power bring down the barriers to an emerging service-oriented architecture out there.

An interesting universe of web services that can support event-based health surveillance is emerging out there.

What is event-based surveillance?

As defined by the WHO, event-based surveillance is the organized and rapid capture of information about events that are a potential risk to public health. This information can be rumours and other ad-hoc reports transmitted through formal channels (i.e. established routine reporting systems) and informal channels (i.e. media, health workers and nongovernmental organizations reports), including:

  • Events related to the occurrence of disease in humans, such as clustered cases of a disease or syndromes, unusual disease patterns or unexpected deaths as recognized by health workers and other key informants in the country; and
  • Events related to potential exposure for humans, such as events related to diseases and deaths in animals, contaminated food products or water, and environmental hazards including chemical and radio-nuclear events [1].

Compared to classic surveillance, event-based surveillance is not based on collecting information about individuals, animals or plants; rather, it is based on reports about who or what is afflicted with disease. These reports can come from many sources, curated or uncurated, and is usually in free text (or unstructured) format.

Web services

An entry in Wikipedia defines web service as “a software system designed to support interoperable machine-to-machine interaction over a network,” such as the Internet [2]. In recent years, news sources such as Reuters have popularized distribution of online content through RSS feeds. RSS stands for Really Simple Syndication. RSS feeds are the most overlooked but widely deployed web service on the Internet today. An O’Reilly web site article highlights this often missed detail in the history of web services[3]. RSS is based on Extended Markup Language (XML) and has a few variants (e.g., version 1, version 2 and 0.92). These feeds impose a structured format for distributing content. Each item in a feed has, at a bare minimum, a title, date when news item was published and a short description.  Many software platforms have developed Application Programming Interfaces (APIs) to extract feed content from online sites. One variant of RSS, called GeoRSS feeds, contains geographic information (latitude and longitude). Currently, a number of public health agencies, including the WHO and European Surveillance Network post content about health-related events (outbreaks) on their sites and also make them available as RSS feeds. The other methods by which web services are made available include:

SOAP and REST are both XML-based information exchange platforms. A majority of the web services support the simpler REST method.

Web services and event-based surveillance

How can web services be used in event-based surveillance? Aside from RSS feeds, we can find quite a number of both health- and nonhealth-related sources that offer web services. Among these are:

With these web services, developers can come up with ingenious applications that connect APIs together as a processing pipeline – from harvesting unstructured data, transformation from one format to another, linking with other information sources and visualization.

Web services and beyond

Think about weaving these web services into a coherent architecture that serves a particular (useful and practical) purpose. Beginning with the end in mind, the limit is one’s imagination. The requirement to weave these services together and begin to serve a purpose in a domain such as public health is not trivial if one has to put together natural language processing, information retrieval, semantic web standards and data mining in one networked application. Natural language processing, information retrieval and data mining enable extraction of key words and concepts from unstructured data. Description of unstructured data using semantic web language enables linkage to semantically aligned elements from other sources. A welcome development in this connection is the availability of many open-source software products for a multitude of operating system platforms and programming/scripting languages to fulfill nearly every conceivable task towards building this type of application.

References

[1] Event-based surveillance: http://www.wpro.who.int/NR/rdonlyres/92E766DB-DF19-4F4F-90FD-C80597C0F34F/0/eventbasedsurv.pdf

[2] http://en.wikipedia.org/wiki/Web_services

[3] Appnel T. RSS: The web service we already have. 2003 January 22. URL: http://www.oreillynet.com/xml/blog/2003/01/rss_the_web_service_we_already.html.

Get Adobe Flash playerPlugin by wpburn.com wordpress themes