Login required to access some wiki spaces. Please register to create your login credentials
          | 
      
1. Statistical organisations have to deal with many different external data sources. From (traditionally) primary data collection, via secondary data collection, to (more recently) Big Data. Each of these data sources has its own set of characteristics in terms of relationship, technical details and semantic content. At the same time the demand is changing, where besides creating output as "end products" statistical organisations create output together with other institutes.
2. Statistical organisations need to find, acquire and integrate data from both traditional and new types of data sources in an ever increasing pace and under ever stricter budget constraints, while taking care of security and data ownership. They would all benefit from having a reference architecture and guidance for the modernisation of their processes and systems.
3. Let us start by defining data architecture:
- “A data architecture is [an architecture that is] composed of models, policies, rules or standards that govern which data is collected, and how it is stored, arranged, integrated, and put to use in data systems and in organizations.” (Wikipedia[1])
 - “A description of the structure and interaction of the enterprise's major types and sources of data, logical data assets, physical data assets, and data management resources.” (TOGAF 9, Part I[2])
 
4. Although CSDA is (loosely) based on TOGAF, it should be stressed that “data” to statistical organisations means something different from what is understood by most industries. “Data”, to statistical organisations, is the raw material, the parts and components and the finished products, rather than the information needed to support and execute the organisation’s primary processes (although, also in statistical organisations, there is data that plays that role, of course). Although the definition still applies, “data architecture” as meant in this document also has a (slightly) different scope.
A. CSDA, a special kind of Data Architecture
5. CSDA is not a normal Data Architecture, at least not according to the definition of TOGAF. According to TOGAF, a data architecture is an integral part of the Information Systems architecture and “describes the structure of an organization’s logical and physical data assets and data management resources”. CSDA is focused on capabilities related to data and metadata, which can be seen as “data management resources”, rather than on the structure and organization of data assets. Capabilities are strategic elements and are the starting points for the incremental development of the business architecture, the information systems architecture (including the data architecture) and the technology architecture.
6. In fact, CSDA is a “data centric” view of an NSI’s architecture, putting emphasis on the value of data and metadata, the need to treat data as an asset. Both the CSDA architecture and the companion Maturity Model have their focus on the way a statistical organization could/should treat their data and metadata.
7. Because CSDA is “just” a specific view of a general architecture, it has (or should have) all the components of a general architecture. According to TOGAF, this should include the strategic level (the capabilities and roadmap planning) as well as the business, information systems (including data) and technology architecture. The current version only defines the strategic elements, the capabilities. An effort has been made to show how (elements of) GSBPM can be used to create the business architecture, whereas CSPA (and specifically certain, still to be developed, CSPA services) should be the basis for the Information Systems architecture. Because of the currently still very divers situations of NSI’s, it is difficult to say much relevant concerning the general technology architecture, but certain guidelines can be formulated, specifically with respect to security measures.
8. CSDA must be seen in the context of a whole suite of standards, developed and maintained by the international statistical community, led by HLG-MOS. Among these are GSBPM, GAMSO, GSIM, and CSPA. Where applicable, the CSDA also links to other international standards such as TOGAF, DDI, SDMX, DAMA DMBOK, etc.
9. Specifically, for this version of CSDA, reference is made to the following versions of these other standards:
- GAMSO version 1.1
 - GSBPM version 5.0
 - GSIM version 1.1
 - CSPA version 1.5
 - TOGAF version 9
 
10. Another useful reference is the European Interoperability Reference Architecture (EIRA[3]). EIRA focuses on building blocks and distinguishes architecture and solution building blocks. In EIRA terms, an architecture building block "represents a (potentially re-usable) component of legal, organisational, semantic or technical capability that can be combined with other architecture building blocks".
[1] https://en.wikipedia.org/wiki/Data_architecture
[2] http://pubs.opengroup.org/architecture/togaf9-doc/arch/chap03.html
[3] https://joinup.ec.europa.eu/solution/european-interoperability-reference-architecture-eira