Data Sharing (DaSh)

Vision for the Programme

The RDSI project identified two major parts of the DaSh Programme:

  • the DaSh Collaboration Network (DaShNet), which will develop a high performance network between RDSI participants, in conjunction with the NRN project; and
  • the DaSh Technical Architecture, which will develop the data sharing and data movement services that will use DaShNet in providing services for the sector.

The vision for the programme is, therefore, twofold. It encompasses a significantly enhanced network capability to support connection between nodes and between institutions and nodes. This capability is defined in conjunction with the National Research Networks (NRN) project and is funded by them as a sub-project.

It also includes the development of a DaSh Technical Architecture which will support the ability to move data between nodes and between nodes and institutions, to enable collaborative and uniform access to that data, to support automated workflow for users of some node services and to identify security barriers which might reduce the effectiveness of these activities.

The large volume of data that Nodes of the RDSI project will hold, provides a level of complexity for which there are few exemplars elsewhere. As a result, the DaSh Programme is the most complex of the four programmes of RDSI.

Design of the Programme

The DaSh programme was designed to develop the DaShNet Network Architecture and the DaSh Technical Framework in consultation with stakeholders and propose an approach for RDSI-funded Nodes to use as basis to develop their infrastructure. Together, DaShNet and the Technical Framework were integrated to become the DaSh Technical Architecture which describes the network, data mover, security and identity, data access, cloud gateway, test platform and workflow automation capabilities for the RDSI-funded Nodes. Creating support materials for identified services was also considered a priority and the RDSI project identified the ability to address this through contribution to the AeRO Support Project.

Background information

In order to identify what high performance data sharing and data movement services as well as other services were needed by the sector this programme ran substantial consultations with relevant research sector stakeholders, researched and evaluated existing services and products. The extensive Node consultation undertook in 2010 has made clear that this programme needed to evolve to accommodate the Nodes and researchers requirements.

In 2011 a series of "strawmen" consultations outlined options and characteristics with the specific aim of eliciting feedback to accommodate the Nodes and researchers requirements. The RDSI project used websites, email lists and newsletters to publish the strawmen and generate wider feedback. Visits to universities and research organisations together with a small number of national workshops and regional consultations supplemented these consultations.

The feedback was collated and a 'Tinman' document outlining the themes and activities proposed under this programme was published.
The feedback received by stakeholders was used to further refine and evolve the approach to develop the DaSh architecture via the themes and subprojects described in both the 2012-2013 and 2013-2014 RDSI Annual Business Plans.

Status of the Programme

The DaSh programme assessed the capability to support the sharing and re-use of research data by the RDSI-funded Nodes and identified the major activities to be undertaken to assist Nodes with options for their infrastructure development. The DaSh programme was refined in the 2013-2014 RDSI Annual Business Plan (ABP) to further support the needs of Nodes in line with the second and third RDSI project objectives:

  • Create and develop a data storage infrastructure accessed through a common infrastructure layer provided by agencies within the sector, or commercial providers, or both.
  • Connect the data storage infrastructure created by the RDSI Project to the Australian Research and Education Network (AREN) by a high bandwidth connection, funded and constructed under the Super Science National Research Networks (NRN) Project. This will include dedicated high speed connections between major Nodes

The DaSh Technical Architecture was developed to meet these objectives and is shown (diagrammatically) below.

Stakeholder's participation in the implementation and refinement of DaSh themes was crucial. The creation of the Technical Advisory Committee (TAC) assisted in refining the DaSh programme. This group met on a regular basis to discuss technical matters related to the Node start up and what was required of the DaSh programme.

The RDSI Project organised regular meetings to discuss their architecture, implementation and timelines to provide input into the project. The refinement of the DaSh architecture was addressed through the creation of themes to implement the network, security and identity, data access, sharing and movement, testing platform, and workflow automation capabilities commensurate with the importance, volume and complexity of the large amounts of data the RDSI Nodes will hold.

The DaSh themes identified some dependencies between individual components, particularly in the areas of DashNet and Data Mover which resulted in some adjustments to the originally anticipated timelines. These adjustments are reflected in the commentary on milestones.

The DaSh themes activities carried out during 2012-2014 to identify components of the technical architecture are grouped by theme and are briefly described below. Many of the themes contain sub-projects and these are generally described separately in more detail. The Data Access theme is currently being further refined through consultation with nodes and the RDSI board.

DaSh Themes

Identity and security theme

The RDSI project has invested in the provision of storage for nationally significant data collections for Australian researchers. This storage is being delivered and managed through Nodes. Essential in providing this service the Australian and international research community is the ability to control who can access and manipulate data stored within the RDSI-funded storage infrastructure. To this end, individuals will need identities and a mechanism for assigning access rights to RDSI-funded storage allocations. The RDSI project is funding the development of such a system in conjunction with the Australian Access Federation (AAF) which will also provide a simplified web interface for managing group based access rights to collections. The system is called reX.

Furthermore the RDSI project under this theme will use a framework to identify potential security threats, define security policies and assist Nodes with expert security advice on security architecture and networking as they relate to RDSI-funded data storage. This theme includes both the Identity and Security components of the DaSh Technical Architecture.

To support this theme the following activities were carried out:

  • The Project Moonshot feasibility study was completed and deemed unsuitable to be implemented by Nodes as an identity management system.
  • A number of meetings and consultations took place with key stakeholders including the Australian Access Federation (AAF), Australian National Data Service (ANDS), National eResearch Collaboration Tools and Resources Project (NECTAR), and Nodes to refine the technical architecture.
  • Developed the Data Storage Identity Layer (DSIL) proof of concept to identify if it could be used as an identity and registration solution for researchers to access collections. The lessons learned from this activity were many and this solution was deemed unsuitable by the Nodes due to the large overhead to maintain this technology after the RDSI project concluded.
  • Concluded the evaluation of commercial options such as Ping and Empower ID in partnership with the AAF. The evaluation provided further insight on the identity management, authentication and authorisation of the DaSh technical architecture. Offered solutions were deemed unsuitable on their own by the Nodes.
  • After the trials described above it became clear that a hybrid solution to provide the ability to control who can access and manipulate data stored within the RDSI funded storage infrastructure was required. To this end, individuals would need identities and a mechanism for assigning access rights to RDSI funded storage allocations.
  • In partnership with the AAF and Nodes further investigation progressed to achieve a sound solution for identity management, authentication and authorisation. Culminating with the development of a proof of concept for researcher's identity and access management (reX IAM) System currently being evaluated by Nodes. The reX project plan describes the approach to take this solution into production.
  • Security aspects of the Node infrastructure have been highlighted through a security policy guideline documents. The Security Sub-Project is described here.

DaShNet theme

The DaShNet theme was developed to support the connection of Nodes by a very fast network that is part of the Australian Academic Research and Education Network (AARNet). The project to establish these connections is also called DaShNet and is funded by the National Research Networks (NRN) project. This theme includes the Networks, Data Mover and Public Cloud components of the DaSh Technical Architecture.

To support this theme the following activities were carried out:

  • Concluded the Pathfinder Network File Sending Performance testing using Aspera and Globus Online products looking at solutions to assist with data movement. These technologies were considered promising and further investigation continued in 2013-2014. This activity crosses over with the Data Access theme.
  • Developed the Draft DaShNet proposal and submitted to NRN for approval. This was approved by NRN and work has commenced with AARNet as the subcontractor.
  • AARNet commenced work and implemented redundant fibre connectivity to Nodes where required.
  • Links to Intersect, iVEC and VicNode/Monash Nodes are live. Other links will be coming up during Q1 2014.
  • 'The Science DMZ is a portion of the network, built at or near the campus local network perimeter that is designed such that the equipment, configuration, and security policies are optimized for high-performance scientific applications. Developed by ESnet engineers, the Science DMZ model addresses common network performance problems encountered at research institutions by creating an environment that is tailored to the needs of high performance science applications, including high-volume bulk data transfer, remote experiment control, and data visualization'. A short trial of the Science DMZ was carried out and was encouraging and will be implemented as part of DaShNet proposal.
  • The ScienceDMZ's hardware for all sites has been purchased, including the Border Edge Switches (BES), Data Transfer Nodes (DTN) and PerfSONAR monitoring systems, and is in various stages of deployment and configuration. This is being supported by Chris Myers at V3 Alliance and the team at AARNet. Most sites will have all of their hardware installed by mid-March. The DTNs also run the Aspera data-transfer software packages which have been purchased for all Nodes.
  • Developed network access to one or more public cloud infrastructure providers to support collaboration around research data.

Data access theme

The research community has a wide range of existing and emerging mechanisms for making their data visible, and conversely an absence of a simple national framework for authorisation. Rather than tackling all of these issues independently and repeatedly, which greatly impacts the user experience of the services, it is considered that commonality across the Nodes is helpful. This helps to deliver quick and easy outcomes for some large fraction of the community, builds a national set of standards, and can reduce costs.

The Data access theme was created to help Nodes to develop an overarching strategy or framework for coordination, as well as funding some implementation projects where commonality is important.

To support this theme the following activities were carried out:

  • The SME Cloud File Server appliance is designed to unify data from different file stores and SaaS services. The Storage Made Easy (SME) testing conducted from mid-January to mid-March 2013 concluded and a report has been circulated to Nodes. This solution was not found to meet sufficient requirements.
  • Concluded the RDSI Web Object (WOS) Trial (aka share.edu) which ran from the beginning of February 2013 until 12 April 2013. A total of 55 users registered to trial the service. Of these 13 people made particularly heavy use of the service. The software was stable during the trial. Positive feedback was provided by the users. This trial was successful and a report has been circulated to Nodes.
  • The LiveArc Dimensioning project was carried out to road test a collection management system that would be useful to researchers in managing their ReDS research data at the Nodes. A LiveArc working group was established to participate in this project and it was agreed that Nodes would benefit from using this tool.
  • RDSI project will fund the Livearc solution at Nodes and is currently developing the contract variations to accommodate the funding to Nodes.
  • RDSI project is assisting Nodes to identify middleware options, further consultation will take place and a report will be distributed to Nodes.
  • RDSI Project evaluated the CKAN option and provided a report.
  • Facilitated a workshop in February 2014 for Nodes to discuss data service proposals that would support RDSI funded storage access and prioritise funding. Further discussions will occur between Nodes and the RDSI project intends to fund the Nodes proposals.  
    • The workshop resulted in four recommendations that were later approved by the RDSI board:
      • Programmatic Data Access Service: Tool to enable remote users to discover and access data collections via network-based data services. This activity will be allocated $399,164 and shall be managed by NCI with appropriate reporting to the RDSI project Office. This will be implemented contractually as a variation to the existing NCI/ANU node contract.
      • Researcher file syncing and sharing system: Software to allow researchers using RDSI storage to sync and share files in a similar fashion to Dropbox. This activity will be allocated $25,000 for the development of a funding proposal. QCIF will manage the project with appropriate reporting to the RDSI project Office. The project will be implemented contractually as a variation to the existing QCIF node contract and the RDSI Project Director will agree milestones and release funds against delivery. A further $350,000 is reserved for this activity, however the board will decide on the amount released.
      • LiveARC meets Aspera: Integrate Aspera with LiveARC to seamlessly exploit the benefits of both tools.  The proposal will be allocated $54,625 and will be delivered by Architecta/SGI.
      • LiveARC extended licence: LiveARC is a single platform that enables researchers to manage data.  LiveARC is a key component of the DaSh Technical Architecture and will be allocated $170,436. These funds will be used to grant Nodes with an extended LiveARC licence which incorporates an additional year of maintenance.

Workflow theme (ARMS)

The workflow theme was created to support Nodes with their reporting requirements and the RDSI project's reporting obligations to the Commonwealth.

To support this theme the following activities were carried out:

  • RDSI released a draft specification outlining functional requirements, including an online application form for ReDS merit-allocated storage and support for a suggested assessment and allocation workflow.
  • RDSI project funded QCIF to develop a solution to streamline the ReDS workflow. The Allocation Request Management System (ARMS) was developed using ReDBox. Key benefits of this approach included re-use of existing software which was actively used by stakeholder and supported by a Node Operator, metadata handling functionality and support for review workflows. A phased approach is used on the development of ARMS. The ARMS subproject is described here in further detail.
  • ARMS I went live as a production (beta) system on 24 January. Its status is beta, to reflect the ongoing functional enhancements. Support and maintenance of ARMS I beta and subsequent updates is funded to 31 May 2014.
  • ARMS II will be developed to extend the functionality of ARMS to fully support the common application form and electronic assessment processes required by the ReDS programme.
  • The RDSI Portal has been developed and is accessible at www.rdsi.edu.au.
  • The dashboard project is an integral part of the RDSI portal and currently provides status on allocation and ingest by node as well as general security information. Further data will be added as the outputs of the reporting project are integrated into the dashboard.
  • Nodes are required to report on their performance and various other statistics to the RDSI project on a regular basis. The reporting requirements are detailed in the contracts and amendments between the University of Queensland and each Node. Currently nodes are providing information via email on a best efforts basis as they establish their infrastructure and services. These statistics are manually collated and stored by the RDSI project. A summary of some of the information is currently provided on the RDSI dashboard, with the data being manually updated by RDSI staff. The monitoring and reporting project will support the gathering of information from Nodes and regularly report on Node's infrastructure. An automated reporting mechanism has been proposed and will be developed with support from Nodes to assist Nodes in fulfilling their reporting obligations.

Test Platform (former Node Zero theme)

The test platform theme was created to support meaningful evaluation of RDSI funded infrastructure and applications in a realistic environment spanning over RDSI-funded Nodes.

To support this theme the following activities were carried out:

  • RDSI has established a test platform in conjunction with two Nodes and this environment is being used to test RDSI-related components, change management and integration testing. The successful implementation of RDSI infrastructure and applications are dependent on the ability to undertake meaningful evaluation and testing. The test platform principally uses cloud based resources. The establishment of this test platform which spans multiple Nodes will make possible supporting the evaluation of infrastructure and applications in a realistic environment.

Creation of Support Materials

The creation of resources needed for the future support of RDSI services are covered under this theme.

To support this theme the following activities were carried out:

  • RDSI Project funded Nodes to create the resources needed to allow the effective future support of RDSI Node services. This activity is being achieved through the AeRO support project.

Milestones from the 2013/2014 Business Plan

The milestones relating to the DaSh programme are available in the business plan together with their current status. Full details on all milestones in this business plan can be found here.