Overview report on the SYNTHESYS / EDIT Itinerary Tool and associated webservices Overview The Itinerary project started as a broad proposal for providing "itinerary services" for "integration, visualisation and quality check" (Meirte et al.,2006. Meirte, 2005. Meirte et al., 2005), within the Network Activity 3.7 (NA-D 3.7) of the SYNTHESYS framework. Itinerary services aim to retrace the most probable pathway taken by scientific explorations (expeditions), based on the georeferenced data points that are available : specimen collection points and/or named localities of visited places (as found in mission reports and field notes). The idea was to provide a semi-automated assessment of the data set, visualising the conclusions for a human eye to check. An early report on the setup and development was provided in Meganck et al., 2006. But ideas on the practical implementation have changed substantially along the road. Three important influences should be mentioned, that helped to give the toolset in development a broader usuability and better external connetions : * The RMCA participated in a Taxonomic Database Working Group (TDWG) organised workshop, creating a work flow similar to the one envisaged for the Itinerary web services. The insights gained in this workshop profoundly influenced the consequent development of the Itinerary Tool (see http://www.tdwg.org/fileadmin/subgroups/meeting_reports/GIG_BioGeoSDI_Report.pdf). * RMCA was asked to develop the SYNTHESYS NA_D 1.6 "Generic Query Tool", enabling generic connections to any TAPIR, DiGIR, and BioCase online data server (see Meganck, 2007). This would be a great way of importing freely available data points (e.g. froml GBIF) into the tool for comparison and analysis. * RMCA participated to the EU EDIT project, workpackage WP 5.4, developing the Internet Platform for Cybertaxonomy. The synergies were obvious and the decision was taken to integrate the itinerary toolset into this Platform. Additionaly, closely related topics were explored in the margin of the work, some making it into the toolset (e.g. the phylogeographical module), some not (or not yet) - but always providing new and exiting insights and possibilities for further exploration. Technical setup The itinerary toolset (ItinTool for short) grew bigger and more intricate than we ever imagined, but the thoroughly modular concept and object oriented code make for a good structural overview nonetheless. The dispatcher At the heart of it all is the dispatcher module, taking its commands through a standards HTTP POST call, and coordinating the work to be done. Two control classes are addressed as necessary : the Filehandler class regulating all traffic (input, output and temporal storage), and the ProcessingUnit class providing every possible algorithm for data processing (e.g. the Dijkstra algorithm or the code for making a distance matrix). After all is done, the dispatcher module outputs a simple PHP page with a link to the appropriate result page (as there can be many different ways for displaying the results). The Filehandler control class The dispatcher calls the functions of the FileHandler class, with generic, functional names such as "readPointsFromFile" and "writePointsToFile". The FileHandler handles the any further subtleties involved, and calls the appropriate module for doing the real work (e.g. the KML_Module for input of a KML file). The Filehandler Modules Orchestrated by the Filehandler control class, the filehandler modules do the real hands-on work for input, parsing, output and storage of files. Note that database connections are also considered as a form of file access (which technically they are, by the way) - so they are initiated by the filehandler control class as well. Together, the modules provide a wide range of possible input, storage and output formats. CSV module Most people will want to upload their own data points in a comma separated file, as its is easy to produce such a file (.csv file) from a spreadsheet - and most primary data storage and analysis is done in spreadsheets. The same goes for data output : a csv file can be easily imported into a spreadsheet for further processing (e.g. making data graphs). GML module Geographical Markup Language is an XML derivative for geographical applications. It is supposed to be the "lingua franca" for all things GIS, so many geographical applications can use GML as input or output. However, the full GML schema is very extensive, so many applications only employ a subset of its features (Google Earth KML, for example, is complementary to GML. In its current implementation, the GML module only parses the GML "Point" object from GML input files (not Linestring, not Polygon) , and no attempt at using GML for output is yet made. GPX module GPX (the GPS Exchange Format) is a light-weight XML data format for the interchange of GPS data (waypoints, routes, and tracks). Its is widely used as a GPS-device offload format, so we decided that it should be supported. This will allow people to import their locations and itinerary waypoints directly from their GPS, for visualising on a map. GQT module The GQT module is a bit different from the others, in that it doesn't read or write files as such, but makes connections to TAPIR, DiGIR and Biocase servers for data queries. The GQT module will provide connection, submit queries and parse the provider's answer - returning all this information transparantly to the ItinTool, just as if it would have come from an input file. ITML module ITML (Itinerary XML) is a testcase application schema specifically aimed at describing itineraries, following the data model set out in Meganck et al., 2006, dividing a pathway in "expedition", "itineraries", "sections" and other objects. It was developed for covering the need for a very light, very focused data model - but it is still uncertain if it will remain necessary in the future : if the itinerary description could be done in an already existing application schema, that would be better. KML module KML is a (newly adopted) OGC standard (see : http://google-latlong.blogspot.com/2008/04/kml-new-standard-for-sharing-maps.html) for description of geographical data. Supporting KML was a no-brainer, as Google Earth is a very convenient tool for quickly defining input points, and quickly visualising output points. TRE module The TRE module was added after itineraries and phylogeography proved to be surprisingly similar. It takes in and parses a Nexus treefile, describing the structure of a phylogenetic tree. Once coordinates for the data points are added (in a .csv file) the tree can be visualised in 3D just like itineraries. Database module A database is used for temporary storage of the data points, and connection to a WMS (Web Map Service) for visualisation of the points. The beauty of serving the points through a WMS is that the data layer can be used everywhere through an online or a local GIS system. For now, the database module is focussed on PostGIS, but any other SQL database could be implemented quite easily. The ProcessingUnit control class The second sidekick of the dispatcher module is this ProcessingUnit control class. It presents generic functions to the dispatcher, with names like "findShortestPath" and "simplifyLine", calling the more specialised functions in the ProcessingUnit modules for doing the real work. The processingUnit modules Incorporating the processing algorithms, these modules are the heart of the data analysis. Each type of processing has its own module. Dijkstra module One can't analyse a pathway without resorting, at one time or another, to Dijkstra's algorithm for finding the shortest path. This code was already available in a free (GPL'ed) PHP implementation on the Web. And as it is not our intention to reinvent the wheel, we incorporated this code into the ItinTool. RDP module The Ramen-Douglas-Peucker algorithm simplifies the contours of a line according to tolerance parameters, sifting out the data points that do not provide any additional information. This is, of course, very useful for pathway analysis. Like the Dijkstra code, the RDP algorithm was available on the Web, courtesy of ..............; The service of polyline simplification, in combination with the GPX input possibility, will certainly attract attention of external initiatives such as OpenStreetMap, for processing of GPS tracks into community-made maps. Phylogeo module RMCA's JEMU molecular unit has been asking for the possibilities of visualising phylogeographical data, along the lines of (REF). Looking into this, we came to the conclusion that this visualisation was very close to what we were doing for itineraries. Both have georeferenced data points, and one or more possible structures superimposed on these points : the possible pathways or the possible phylogenetic trees. The visualisation is the same : plot the points and connect them with lines to provide the structure. Only the altitude factor is different : a phylogenetic tree will have a much stronger Z-axis component, the connections of the points being made in a vertical plane rather than a horizontal one. But as x,y and z components apply to all data points, the difference is only in the input, not in the visualisation technique. itinerary module The itinerary module is to be the central module for the itinerary analysis. It will incorporate some algorithms of its own, but also use the algorithms already defined in other modules where necessary, e.g. the Dijkstra or the Ramen-Douglas-Peucker algorithm. Meganck, B., Meirte, D., Mergen, P. and Theeten, F. 2006. Providing itinerary related datasets and tools for integration, visualisation and quality check - system specifications. SYNTHESYS report for Network Activity NA-D 3.7. Meganck, B., 2007. SYNTHESYS Network Activity D Report Deliverable 1.6.1. Report on a generic translator for the communication protocols used in GBIF/SYNTHESYS context. Royal Museum for Central Africa, Tervuren. Meirte, D., Mergen,P.Meganck,B. and Theeten, F.,2006. SYNTHESYS NA-D 3.7 Providing itinerary related datasets and tools (for integration, visualisation and quality check). Meirte, D. 2005. Proposed schedule for the RMCA project, entitled Providing itinerary related datasets and tools (for integration, visualisation and quality check). Internal RMCA/SYNTHESYS document. Mergen,P.,Louette,M.,Snoeks,J.,de Meyer,M.,Meirte,D., 2005. The Royal Museum for Central Africa in the era of biodiversity informatics. Department of Zoology, Royal Museum for Central Africa. Leuvensesteenweg13, B-3080 Tervuren, Belgium.