I’m back!

Earlier I have described the OpenCalais Web Service.

The ecosystem of web services

The NASA Earth Observatory Glossary defines an ecosystem as “any natural unit or entity including living and non-living parts that interact to produce a stable system through cyclic exchange of materials” [NASA]. The concept can be applied to Internet-based applications that function as information-consuming or information producing “organisms” and that interact with each other in an interdependent way through exchange of information.

The IBM web site, on the other hand, defines “web services” as “self-contained, modular, distributed, dynamic applications that can be described, published, located, or invoked over the network to create products, processes, and supply chains.”

As discrete, possibly autonomous “organisms” in an Internet-based information ecosystem, web services-enabled applications expose data and/or service end points in multiple ways including Really Simple Syndication (RSS) feeds, and web services Application Programming Interfaces (APIs) using Simple Object Access Protocol (SOAP), XML Remote Procedure Call (XML-RPC) or REpresentational State Transfer (REST). Aside from the use of XML to embed data in responding to data or process requests, an increasing number of web service applications also provide responses using Javascript Object Notation (JSON). OpenCalais and Alchemy use Resource Description Framework (RDF), an XML-based semantic web format that structures data as triples (subject, predicate, object), to respond to API requests and both perform named entity disambiguation by linking to external knowledge bases (e.g., CIA Factbook, Wikipedia, Freebase). These web service applications may even provide machine learning-based services such as natural language processing (specifically named entity extraction and concept annotation), language detection and translation and text classification. Tools that enable semantic processing of content (not just classification) potentially allow exposing richer knowledge-based content embedded in unstructured data such as news about outbreaks and disasters.

Read the rest of this entry

EpiSPIDER now streams detected public health events to Twitter. You can follow the stream at http://www.twitter.com/epispider.

,

From the Daylife API web site, the API provides a technical platform for programmers, news editors, publishers to access the Daylife news aggregation and analysis service. The web site provides useful examples of how to access the API using different scripting languages (certainly PHP-friendly) and the authors are very willing to help users get started. The analysis service provides new ways of looking at news information that potentially makes this news aggregation web service a “killer application”.

An interesting universe of web services that can support event-based health surveillance is emerging out there.

What is event-based surveillance?

As defined by the WHO, event-based surveillance is the organized and rapid capture of information about events that are a potential risk to public health. This information can be rumours and other ad-hoc reports transmitted through formal channels (i.e. established routine reporting systems) and informal channels (i.e. media, health workers and nongovernmental organizations reports), including:

  • Events related to the occurrence of disease in humans, such as clustered cases of a disease or syndromes, unusual disease patterns or unexpected deaths as recognized by health workers and other key informants in the country; and
  • Events related to potential exposure for humans, such as events related to diseases and deaths in animals, contaminated food products or water, and environmental hazards including chemical and radio-nuclear events [1].

Compared to classic surveillance, event-based surveillance is not based on collecting information about individuals, animals or plants; rather, it is based on reports about who or what is afflicted with disease. These reports can come from many sources, curated or uncurated, and is usually in free text (or unstructured) format.

Web services

An entry in Wikipedia defines web service as “a software system designed to support interoperable machine-to-machine interaction over a network,” such as the Internet [2]. In recent years, news sources such as Reuters have popularized distribution of online content through RSS feeds. RSS stands for Really Simple Syndication. RSS feeds are the most overlooked but widely deployed web service on the Internet today. An O’Reilly web site article highlights this often missed detail in the history of web services[3]. RSS is based on Extended Markup Language (XML) and has a few variants (e.g., version 1, version 2 and 0.92). These feeds impose a structured format for distributing content. Each item in a feed has, at a bare minimum, a title, date when news item was published and a short description.  Many software platforms have developed Application Programming Interfaces (APIs) to extract feed content from online sites. One variant of RSS, called GeoRSS feeds, contains geographic information (latitude and longitude). Currently, a number of public health agencies, including the WHO and European Surveillance Network post content about health-related events (outbreaks) on their sites and also make them available as RSS feeds. The other methods by which web services are made available include:

SOAP and REST are both XML-based information exchange platforms. A majority of the web services support the simpler REST method.

Web services and event-based surveillance

How can web services be used in event-based surveillance? Aside from RSS feeds, we can find quite a number of both health- and nonhealth-related sources that offer web services. Among these are:

With these web services, developers can come up with ingenious applications that connect APIs together as a processing pipeline – from harvesting unstructured data, transformation from one format to another, linking with other information sources and visualization.

Web services and beyond

Think about weaving these web services into a coherent architecture that serves a particular (useful and practical) purpose. Beginning with the end in mind, the limit is one’s imagination. The requirement to weave these services together and begin to serve a purpose in a domain such as public health is not trivial if one has to put together natural language processing, information retrieval, semantic web standards and data mining in one networked application. Natural language processing, information retrieval and data mining enable extraction of key words and concepts from unstructured data. Description of unstructured data using semantic web language enables linkage to semantically aligned elements from other sources. A welcome development in this connection is the availability of many open-source software products for a multitude of operating system platforms and programming/scripting languages to fulfill nearly every conceivable task towards building this type of application.

References

[1] Event-based surveillance: http://www.wpro.who.int/NR/rdonlyres/92E766DB-DF19-4F4F-90FD-C80597C0F34F/0/eventbasedsurv.pdf

[2] http://en.wikipedia.org/wiki/Web_services

[3] Appnel T. RSS: The web service we already have. 2003 January 22. URL: http://www.oreillynet.com/xml/blog/2003/01/rss_the_web_service_we_already.html.

We have just released feeds to WAHID data as GEORSS and KML. WAHID stands for World Animal Health Information Database. Please visit the FEEDS page to download the WAHID feeds. The KML and GEORSS feeds contain WAHID reports from the last 90 days.

,
Get Adobe Flash playerPlugin by wpburn.com wordpress themes