Seamlessly Realizing the Goals of the 2020 Federal Data Strategy
The U.S. Federal Data Strategy, 2020, outlines a broad agenda for federal agencies in leveraging data as a strategic asset. In support of this effort, the strategy outlines 10 principles, 40 practices, and 20 specific, measurable actions. This guidance ranges from ensuring relevance in the data, using data to guide decision-making, and publishing and updating data inventories.
In general, all of this guidance is focused on making data more actionable and sharable while assuring that it can only be accessed by authorized users. Some actions, like launching a chief data officer council, will probably seem more straightforward to most agencies, whereas others, like improving data and model resources for artificial intelligence (AI) research and development, might seem a bit more ambitious.
Fundamentally, the Federal Data Strategy is calling on agencies to modernize their data infrastructures, but many agencies will find that challenging. Many will have to contend with the sheer growth of data volumes, and its dispersal across a staggering range of locations. In the private sector, this has underscored the need for advanced data management and data integration techniques, which have proven to be a differentiator for leading-edge companies. In the public sector, many agencies rely on more traditional techniques for data integration, such as extract, transform, and load (ETL) processes.
ETL processes are highly effective for moving large amounts of data from one repository to another, but because they run in scheduled batches, they cannot deliver data in real time. Also, they are extremely resource-dependent, in that their scripts must be revised, re-tested, and redeployed every time there is a change to a source system. They are also unable to accommodate many modern data sources, such as streaming sources and Internet of Things (IoT) sources, which are critical for AI development, nor can they support unstructured sources such as social-media feeds.
Perhaps the most challenging aspect of ETL processes and similar techniques, for federal agencies, is that duplicating data means creating and managing new data repositories, which complicates Global Data Protection Regulation (GDPR) compliance, as the GDPR places heavy restrictions on how personal data must be stored.
The Power of Logical Data Fabric
In recent years, analysts from Gartner and Forrester, and thought-leaders from The Data Warehouse Institute and other organizations, have recommended a new strategy for data integration, one that does not rely on the physical replication of data. Rather than physically collecting data into a new repository, they advocate logically connecting to the data, remotely, leaving the source data in its existing location. They describe this approach as a “logical data fabric,” which flexibly spreads across an organization’s different data sources, providing seamless access to consumers of the data.
One of the most effective ways to implement logical data fabric is through a modern data integration and data management technology called data virtualization. Data virtualization is a natural fit for logical data fabric, because rather than physically replicating data into a new repository, data virtualization provides real-time, virtualized views of the data, leaving the source data exactly where it is. This means that agencies do not have to pay the costs of moving and housing the data, nor do they have to unnecessarily complicate their GDPR compliance efforts, and yet they still gain all of the benefits of data integration.
Because data virtualization accommodates existing infrastructure in its existing state, it is relatively easy to implement, compared with other solutions. And because it provides data in real time, from a variety of systems that are normally very time consuming to integrate, such as transactional processing systems and cloud-based storage systems, it can support a wide variety of uses cases, including many of the recommendations in the Federal Data Strategy.
To call out just a few, data virtualization can facilitate:
- Real-time, secure data-sharing
- The delivery of diverse streaming data sources to AI models
- Data governance across diverse sources
- Compliance with GDPR and other regulations
Putting the Strategy into Action
In time, all federal agencies will find ways to realize the goals of the Federal Data Strategy. Using logical data fabric, however, they will find the process surprisingly fast, simple, and affordable.