Data Sharing (DaSh)

Vision for the Programme

The RDSI project identified two major parts of the DaSh Programme:

  • the DaSh Collaboration Network (DaShNet), which will develop a high performance network between RDSI participants, in conjunction with the NRN project; and
  • the DaSh Technical Architecture, which will develop the data sharing and data movement services that will use DaShNet in providing services for the sector.

The vision for the programme is, therefore, twofold. It encompasses a significantly enhanced network capability to support connection between nodes and between institutions and nodes. This capability is defined in conjunction with the National Research Networks (NRN) project and is funded by them as a sub-project.

It also includes the development of a DaSh Technical Architecture which will support the ability to move data between nodes and between nodes and institutions, to enable collaborative and uniform access to that data, to support automated workflow for users of some node services and to identify security barriers which might reduce the effectiveness of these activities.

The large volume of data that Nodes of the RDSI project will hold, provides a level of complexity for which there are few exemplars elsewhere. As a result, the DaSh Programme is the most complex of the four programmes of RDSI.

Design of the Programme

The DaSh programme was designed to develop the DaShNet Network Architecture and the DaSh Technical Framework in consultation with stakeholders and propose an approach for RDSI-funded Nodes to use as basis to develop their infrastructure. Together, DaShNet and the Technical Framework were integrated to become the DaSh Technical Architecture which describes the network, data mover, security and identity, data access, cloud gateway, test platform and workflow automation capabilities for the RDSI-funded Nodes. Creating support materials for identified services was also considered a priority and the RDSI project identified the ability to address this through contribution to the AeRO Support Project.

Background information

In order to identify what high performance data sharing and data movement services as well as other services were needed by the sector this programme ran substantial consultations with relevant research sector stakeholders, researched and evaluated existing services and products. The extensive Node consultation undertook in 2010 has made clear that this programme needed to evolve to accommodate the Nodes and researchers requirements.

In 2011 a series of "strawmen" consultations outlined options and characteristics with the specific aim of eliciting feedback to accommodate the Nodes and researchers requirements. The RDSI project used websites, email lists and newsletters to publish the strawmen and generate wider feedback. Visits to universities and research organisations together with a small number of national workshops and regional consultations supplemented these consultations.

The feedback was collated and a 'Tinman' document outlining the themes and activities proposed under this programme was published.
The feedback received by stakeholders was used to further refine and evolve the approach to develop the DaSh architecture via the themes and subprojects described in both the 2012-2013 and 2013-2014 RDSI Annual Business Plans.

Status of the Programme

The DaSh programme assessed the capability to support the sharing and re-use of research data by the RDSI-funded Nodes and identified the major activities to be undertaken to assist Nodes with options for their infrastructure development. The DaSh programme was refined in the 2013-2014 RDSI Annual Business Plan (ABP) to further support the needs of Nodes in line with the second and third RDSI project objectives:

  • Create and develop a data storage infrastructure accessed through a common infrastructure layer provided by agencies within the sector, or commercial providers, or both.
  • Connect the data storage infrastructure created by the RDSI Project to the Australian Research and Education Network (AREN) by a high bandwidth connection, funded and constructed under the Super Science National Research Networks (NRN) Project. This will include dedicated high speed connections between major Nodes

The DaSh Technical Architecture was developed to meet these objectives and is shown (diagrammatically) below.

Stakeholder's participation in the implementation and refinement of DaSh themes was crucial. The creation of the Technical Advisory Committee (TAC) assisted in refining the DaSh programme. This group met on a regular basis to discuss technical matters related to the Node start up and what was required of the DaSh programme.

The RDSI Project organised regular meetings to discuss their architecture, implementation and timelines to provide input into the project. The refinement of the DaSh architecture was addressed through the creation of themes to implement the network, security and identity, data access, sharing and movement, testing platform, and workflow automation capabilities commensurate with the importance, volume and complexity of the large amounts of data the RDSI Nodes will hold.

The DaSh themes identified some dependencies between individual components, particularly in the areas of DashNet and Data Mover which resulted in some adjustments to the originally anticipated timelines. These adjustments are reflected in the commentary on milestones.

The DaSh themes activities carried out during 2012-2014 to identify components of the technical architecture are grouped by theme and are briefly described below. Many of the themes contain sub-projects and these are generally described separately in more detail. The Data Access theme is currently being further refined through consultation with nodes and the RDSI board.

DaSh Themes

Identity and security theme

The RDSI project has invested in the provision of storage for nationally significant data collections for Australian researchers. This storage is being delivered and managed through Nodes. Essential in providing this service the Australian and international research community is the ability to control who can access and manipulate data stored within the RDSI-funded storage infrastructure. To this end, individuals will need identities and a mechanism for assigning access rights to RDSI-funded storage allocations. The RDSI project is funding the development of such a system in conjunction with the Australian Access Federation (AAF) which will also provide a simplified web interface for managing group based access rights to collections. The system is called reX.

Furthermore the RDSI project under this theme will use a framework to identify potential security threats, define security policies and assist Nodes with expert security advice on security architecture and networking as they relate to RDSI-funded data storage. This theme includes both the Identity and Security components of the DaSh Technical Architecture.

To support this theme the following activities were carried out:

  • The Project Moonshot feasibility study was completed and deemed unsuitable to be implemented by Nodes as an identity management system.
  • A number of meetings and consultations took place with key stakeholders including the Australian Access Federation (AAF), Australian National Data Service (ANDS), National eResearch Collaboration Tools and Resources Project (NECTAR), and Nodes to refine the technical architecture.
  • Developed the Data Storage Identity Layer (DSIL) proof of concept to identify if it could be used as an identity and registration solution for researchers to access collections. The lessons learned from this activity were many and this solution was deemed unsuitable by the Nodes due to the large overhead to maintain this technology after the RDSI project concluded.
  • Concluded the evaluation of commercial options such as Ping and Empower ID in partnership with the AAF. The evaluation provided further insight on the identity management, authentication and authorisation of the DaSh technical architecture. Offered solutions were deemed unsuitable on their own by the Nodes.
  • After the trials described above it became clear that a hybrid solution to provide the ability to control who can access and manipulate data stored within the RDSI funded storage infrastructure was required. To this end, individuals would need identities and a mechanism for assigning access rights to RDSI funded storage allocations.
  • In partnership with the AAF and Nodes further investigation progressed to achieve a sound solution for identity management, authentication and authorisation. Culminating with the development of a proof of concept for researcher's identity and access management (reX IAM) System currently being evaluated by Nodes. The reX project plan describes the approach to take this solution into production.
  • Security aspects of the Node infrastructure have been highlighted through a security policy guideline documents. The Security Sub-Project is described here.

DaShNet theme

The DaShNet theme was developed to support the connection of Nodes by a very fast network that is part of the Australian Academic Research and Education Network (AARNet). The project to establish these connections is also called DaShNet and is funded by the National Research Networks (NRN) project. This theme includes the Networks, Data Mover and Public Cloud components of the DaSh Technical Architecture.

To support this theme the following activities were carried out:

  • Concluded the Pathfinder Network File Sending Performance testing using Aspera and Globus Online products looking at solutions to assist with data movement. These technologies were considered promising and further investigation continued in 2013-2014. This activity crosses over with the Data Access theme.
  • Developed the Draft DaShNet proposal and submitted to NRN for approval. This was approved by NRN and work has commenced with AARNet as the subcontractor.
  • AARNet commenced work and implemented redundant fibre connectivity to Nodes where required.
  • Links to Intersect, iVEC and VicNode/Monash Nodes are live. Other links will be coming up during Q1 2014.
  • 'The Science DMZ is a portion of the network, built at or near the campus local network perimeter that is designed such that the equipment, configuration, and security policies are optimized for high-performance scientific applications. Developed by ESnet engineers, the Science DMZ model addresses common network performance problems encountered at research institutions by creating an environment that is tailored to the needs of high performance science applications, including high-volume bulk data transfer, remote experiment control, and data visualization'. A short trial of the Science DMZ was carried out and was encouraging and will be implemented as part of DaShNet proposal.
  • The ScienceDMZ's hardware for all sites has been purchased, including the Border Edge Switches (BES), Data Transfer Nodes (DTN) and PerfSONAR monitoring systems, and is in various stages of deployment and configuration. This is being supported by Chris Myers at V3 Alliance and the team at AARNet. Most sites will have all of their hardware installed by mid-March. The DTNs also run the Aspera data-transfer software packages which have been purchased for all Nodes.
  • Developed network access to one or more public cloud infrastructure providers to support collaboration around research data.

Data access theme

The research community has a wide range of existing and emerging mechanisms for making their data visible, and conversely an absence of a simple national framework for authorisation. Rather than tackling all of these issues independently and repeatedly, which greatly impacts the user experience of the services, it is considered that commonality across the Nodes is helpful. This helps to deliver quick and easy outcomes for some large fraction of the community, builds a national set of standards, and can reduce costs.

The Data access theme was created to help Nodes to develop an overarching strategy or framework for coordination, as well as funding some implementation projects where commonality is important.

To support this theme the following activities were carried out:

  • The SME Cloud File Server appliance is designed to unify data from different file stores and SaaS services. The Storage Made Easy (SME) testing conducted from mid-January to mid-March 2013 concluded and a report has been circulated to Nodes. This solution was not found to meet sufficient requirements.
  • Concluded the RDSI Web Object (WOS) Trial (aka share.edu) which ran from the beginning of February 2013 until 12 April 2013. A total of 55 users registered to trial the service. Of these 13 people made particularly heavy use of the service. The software was stable during the trial. Positive feedback was provided by the users. This trial was successful and a report has been circulated to Nodes.
  • The LiveArc Dimensioning project was carried out to road test a collection management system that would be useful to researchers in managing their ReDS research data at the Nodes. A LiveArc working group was established to participate in this project and it was agreed that Nodes would benefit from using this tool.
  • RDSI project will fund the Livearc solution at Nodes and is currently developing the contract variations to accommodate the funding to Nodes.
  • RDSI project is assisting Nodes to identify middleware options, further consultation will take place and a report will be distributed to Nodes.
  • RDSI Project evaluated the CKAN option and provided a report.
  • Facilitated a workshop in February 2014 for Nodes to discuss data service proposals that would support RDSI funded storage access and prioritise funding. Further discussions will occur between Nodes and the RDSI project intends to fund the Nodes proposals.

Workflow theme (ARMS)

The workflow theme was created to support Nodes with their reporting requirements and the RDSI project's reporting obligations to the Commonwealth.

To support this theme the following activities were carried out:

  • RDSI released a draft specification outlining functional requirements, including an online application form for ReDS merit-allocated storage and support for a suggested assessment and allocation workflow.
  • RDSI project funded QCIF to develop a solution to streamline the ReDS workflow. The Allocation Request Management System (ARMS) was developed using ReDBox. Key benefits of this approach included re-use of existing software which was actively used by stakeholder and supported by a Node Operator, metadata handling functionality and support for review workflows. A phased approach is used on the development of ARMS. The ARMS subproject is described here in further detail.
  • ARMS I went live as a production (beta) system on 24 January. Its status is beta, to reflect the ongoing functional enhancements. Support and maintenance of ARMS I beta and subsequent updates is funded to 31 May 2014.
  • ARMS II will be developed to extend the functionality of ARMS to fully support the common application form and electronic assessment processes required by the ReDS programme.
  • The RDSI Portal has been developed and is accessible at www.rdsi.edu.au.
  • The dashboard project is an integral part of the RDSI portal and currently provides status on allocation and ingest by node as well as general security information. Further data will be added as the outputs of the reporting project are integrated into the dashboard.
  • Nodes are required to report on their performance and various other statistics to the RDSI project on a regular basis. The reporting requirements are detailed in the contracts and amendments between the University of Queensland and each Node. Currently nodes are providing information via email on a best efforts basis as they establish their infrastructure and services. These statistics are manually collated and stored by the RDSI project. A summary of some of the information is currently provided on the RDSI dashboard, with the data being manually updated by RDSI staff. The monitoring and reporting project will support the gathering of information from Nodes and regularly report on Node's infrastructure. An automated reporting mechanism has been proposed and will be developed with support from Nodes to assist Nodes in fulfilling their reporting obligations.

Test Platform (former Node Zero theme)

The test platform theme was created to support meaningful evaluation of RDSI funded infrastructure and applications in a realistic environment spanning over RDSI-funded Nodes.

To support this theme the following activities were carried out:

  • RDSI has established a test platform in conjunction with two Nodes and this environment is being used to test RDSI-related components, change management and integration testing. The successful implementation of RDSI infrastructure and applications are dependent on the ability to undertake meaningful evaluation and testing. The test platform principally uses cloud based resources. The establishment of this test platform which spans multiple Nodes will make possible supporting the evaluation of infrastructure and applications in a realistic environment.

Creation of Support Materials

The creation of resources needed for the future support of RDSI services are covered under this theme.

To support this theme the following activities were carried out:

  • RDSI Project funded Nodes to create the resources needed to allow the effective future support of RDSI Node services. This activity is being achieved through the AeRO support project.

Milestones from the 2013/2014 Business Plan

The milestones relating to the DaSh programme in the business plan are shown below together with their current status. Full details on all milestones in this business plan can be found here.

Milestones Start/Finish Status Comments
Propose DaShNet to NRN project January 2013
/
May 2013
Complete
All contracts signed
Provide DaShNet for Nodes and researcher’s use July 2013
/
December 2014
In progress and ahead of schedule.
First primary node implementation is complete.
Implementation scheduled to complete by September 2014
Provide Science DMZ July 2013
/
October 2013
In progress
The DaShNet project has been extended to incorporate the delivery of the Science DMZ as its effective use is dependent on the network. To be delivered at the same time as DaShNet (Dec-2014)
Interconnect with public cloud August 2013
/
December 2013
Complete
Provide data access infrastructure (data mover) July 2013
/
December 2013
In progress
The DaShNet Project has been extended to incorporate the delivery of the Data Access Infrastructure as its effective use is dependent on the network. To be delivered at the same time as DaShNet (Dec-2014)
Fund the implementation of high capacity storage Nodes May 2012
/
December 2014
On target
All nodes have received Node development funding and the first tranche of ReDS funding
Develop security policy guidelines for Nodes and Institutions July 2013
/
February 2014
On track
Implement a reliable and consistent Identity mechanism for researchers August 2013
/
December 2013
In progress
Identity system prototype completed in Dec 2013. Interim production deployment in April 2014. ReX development complete by July 2014
Implement a reliable and consistent authorisation mechanism for data custodians, researchers and Nodes August 2013
/
March 2014
In progress
Identity system prototype completed in Dec 2013. Interim production deployment of authorisation in May 2014. ReX development complete by July 2014
Develop Security Policy Model for Nodes December 2013
/
February 2014
On target
Develop Security Policy Model for Institutions December 2013
/
February 2014
On target
Evaluate and dimension cloud middleware options July 2013
/
February 2014
On target
Implement cloud middleware options as required September 2013
/
December 2014
In progress
Nodes will implement cloud middleware as required
Evaluate and dimension the CKAN open data portal July 2013
/
February 2014
Complete
Document provided to Nodes
Implement the CKAN open data portal as required October 2013
/
December 2014
Complete
Nodes implementing CKAN as required
Evaluate and dimension a LiveARC solution July 2013
/
February 2014
Complete
The board has approved a LiveARC sub-project
Implement the LiveARC solution as required October 2013
/
December 2014
In progress
All Nodes decided to use the LiveARC solution.
RDSI is finalising arrangements with SGI to deliver this service to Nodes
Implement a collections registry accessible Nodes and RDSI project July 2013
/
September 2013
In progress
Development to be completed by Q2/2014
Support the implementation of widely discoverable data collections October 2013
/
March 2014
In progress
Development to be completed by Q2/2014
Implement a storage allocation process for Nodes and data custodian July 2013
/
September 2013
Complete
Support the implementation of collection management system at Nodes July 2013
/
October 2013
In progress
Beta version released. On-going development until Q3/2014
Implement a monitoring and reporting mechanism to report on Node ingest progress, collections hosted and collection performance November 2013
/
May 2014
In progress
Fund and manage the development of RDSI Portal May 2013
/
September 2013
Complete
RDSI portal is available at rdsi.edu.au
Fund and manage the development of the RDSI DaShBoard August 2013
/
December 2013
In progress
The Dashboard in the RDSI portal currently provides status on allocation and ingest by node as well as general security information. Further data will be added as the outputs of the reporting project are integrated into the Dashboard.
Supporting the implementation of coordinated/integrated metadata flows August 2013
/
December 2013

In progress
Additional consultation is being undertaken to determine requirements

Identify RDSI components to be evaluated in Node Zero over the period of the project July 2013
/
September 2013
Complete
Test platforms are available and in use
Acquire and deploy a stable environment through Nodes to evaluate and dimension tools and resources required to implement and support RDSI project objectives at a production quality level July 2013
/
December 2014
Complete
Test platforms are available and in use
Fund and manage Node Zero deployment July 2013
/
December 2014
Complete
Test platforms are available and in use
Enhance the durability and sustainability of nodes by connecting to public clouds September 2013
/
December 2014
On target
Test connections to a cloud vendor are in place. Others will be added if needed. Testing programme is underway to allow interoperability from nodes
Manage the creation of support materials July 2013
/
June 2014
On target
This is being achieved through the AeRO support project

Funding for the Programme

The DaSh programme was allocated $7,695,000 from the project budget. Of this amount, $2,103,127 has been expended and $4,651,071 is committed/allocated expense. Principal expenses for the programme are the funding of DaSh themes' subprojects described in this document. A graphic summary of the budget is shown here:

DaSh Programme Budget

Detailed budget breakdown including subprojects is shown below :

Activity Expended Committed Allocated Unallocated Total
Staff, Workshops, Travel $1,033,765 $637,198 $1,670,963
ARCS Data Fabric Development $317,217 $317,217
Workflow - ARMS $125,080 $380,000 $505,080
reX Authentication and Authorisation $36,265 $90,429 $1,128,694 $1,255,388
Test Platforms $415,000 $415,000
Monitoring and Reporting $250,000 $250,000
DaShNet development $158,800 $158,800
LiveARC $1,750,000 $1,750,000
Data Access Services $17,000 $940,802 $957,802
Security Framework $30,000 $30,000
Contingency (5%) $384,750 $384,750
Total $2,103,127 $727,627 $3,923,444 $940,802 $7,695,000

These figures reflect current estimates and may vary to a modest extent as some requirements are further refined.

Remaining activities for the Programme

The remaining activities for this programme are definition of the remaining Data Access services to be developed/acquired, identification of possible additional development on authentication and authorisation and delivery of the projects now in train.