45. Information Logistics has two basic modes of operation. Traditionally, NSOs have gathered all the data they needed for statistical production, either capturing it via questionnaires and other collection instruments or acquiring it from other organizations (admin data) which is then physically stored on the NSO’s premises. In other words, data used to be moved to where the processing capabilities were. More recently, with the advent of Big Data, cloud platforms and IoT, NSOs have started to consider more distributed data and processing approaches in which data stays in place (where it was produced, or in the owner’s environment) and the processing is moved to where the data is. The reason for this change is not only technical, e.g. minimize data movement, but also contractual, political and regulatory, e.g. privacy concerns, legal implications, national and transnational regulations, etc. Gartner named these two different modes of operation: the centralized data and processing mode is called “collect”, while the decentralized data and processing mode is called “connect”. [1]
46. CSDA capabilities are flexible enough to support both information logistics modes of operation. The main capabilities that need to be aware of where the data resides and whether it’s a connect or collect scenario are Publication within Information Sharing and Exchange and Persistence within Information Logistics. In particular, Channel Management will have to create channels in the mode specified by the SLA's defined in Relationship Management, which will then be configured and operated by Channel Configuration & Operation.
47. In terms of implementation, the simplification of data logistics in a connect mode scenario will likely produce an increase in complexity of data processing, since data processing will have to be shipped to where the data resides and its results integrated back for downstream consumption. The trade-offs of each approach and what degree of decentralization is required need to be evaluated on a use case basis. For example, Data Transformation and Data Integration might be optimally implemented in a centralized way to serve traditional analytics and statistical production based on data collected by surveys via questionnaires (collect mode). In other situations, these capabilities can be implemented in a decentralized manner (connect mode) when the volume, or the number of sources (e.g. IoT), of admin data could create information logistics issues the NSO may want to avoid.
48. Decentralization also affects metadata management. Metadata describes where the data resides and how the channels operate, how the data relates to the rest of the ecosystem, who accesses it and how business processes use it. In addition, the implementation of Metadata & Schema Linkage becomes more complex when data and its metadata live in entirely different environments.
49. Modern business requirements demand a more proactive and coordinated data integration and data provisioning strategy founded on a portfolio-based data integration approach encompassing:
[1] Modern Data Management Requires a Balance Between Collecting Data and Connecting to Data; https://www.gartner.com/doc/3818366/modern-data-management-requires-balance