CSDA 2.0 - XII. Examples

172. In this chapter, we present a couple of examples of how the CSDA capabilities can be used to design or analyse solutions such as business processes. The first three examples show which CSDA capabilities most likely play a role in the GSBPM phases Collect, Process and Disseminate. The two other examples show which CSDA capabilities are involved in two of the CSDA use cases that are described elsewhere.

A. Data Collection

173. Let’s take a look at the GSBPM phase “Data Collection”. This takes care of bringing data in from the external world. In CSDA's philosophy, that data then enters “the pool”, thus, depending on authorization, of course, becoming available to users both internal and external to the organization. In line with GSIM, all of the work done in this phase takes place in the Exchange Channels, thus in Information Logistics and Information Sharing (Publication). The data related activities of the GSBPM phases Design and Build are collapsed into “Design & Description”. Publication is the capability that takes care of designing and setting up the collection process, as well as preparing "the pool" and any output channels for future access to the information collected. Preparing "the pool" may or may not include setting op persistence mechanisms, but in any case it includes setting up the mechanisms to protect the data in accordance with the policies laid down by the Cross-Cutting capabilities (Governance, Security).

Figure 14: Example GSBPM Collect

174. Although we may want to collect information from sources that contain information in all kinds of forms, the pool of data and metadata contains only digital information. We often even want to collect "intangible" information, such as the facts, ideas and opinions in the heads of people. It is the task of the Exchange Channels (as explained in chapter VIII) to collect and digitize such "intangible" information.

175. Collecting information of such nature, internal persistence will be required in order to decouple the internal processing from the collection.

176. This way, non-digital sources can be treated the same as digital ones. All sources are connected to the “pool” through channels responsible for digitizing any non-digital data.

B. Processing

177. The actual data processing of GSBPM’s “Process” phase takes data from the Pool (through Information Sharing and Information Logistics), and the output is placed back into the pool, again through Sharing and Logistics. Both the final result and any Intermediate results are designed (using Design & Description) and the actual processing happens in Transformation and/or Integration. Any data that is “transient” and not considered of enduring value, is not shared and kept locally in the process.

Figure 15: Example GSBPM Process

178. The process uses input data from the “pool”, and may produce data that is considered suitable to be released into the “pool”. This is a formal act of “publishing”, even if the data is NOT a statistical end-product.

179. Accessing data from the “pool” involves both Sharing Support and the lower level capabilities from Info Logistics (channels).

180. The process may have internal persistence. Data stored there is NOT considered part of the “pool”.

C. Dissemination

181. GSBPM’s Dissemination is the opposite of Data Collection: the final product is made available to “the world”, “the general public” or any subset thereof, again by publishing to “the pool”. External consumers use Sharing and Logistics to search, find and extract the data.

Figure 16: Example GSBPM Disseminate

182. Publishing a statistical end-product is (conceptually) the same as publishing any other Information Set. The information to be published may come from the “pool” or from some internal process.

183. Publishing in the strictest sense only involves the Information Publication capability. In a broader sense, it may involve other capabilities such as Disclosure Control.

184. Information Publication includes: defining the composition of the Information Set, the channels available for access, the date & time of availability, the audience, etc.

185. This, in a nutshell, is how the Core Capabilities can be used in designing the GSBPM (sub-)processes. What we did not show is the role of the Cross-cutting capabilities. Their role is to define and enforce the policies that govern the way that data is handled, protected, assured, etc. It is the responsibility of the Core Capabilities to act in accordance with those policies.

D. Use case: Data Collection (StatCan)

186. The process depicted in the figure below (the dark boxes) is the data collection and initial treatment of a complex set of datasets, which provides 9-character CUSIP numbers, standardized descriptions and additional data attributes for over 6 million corporate, municipal and government securities offered in the United States and Canada. Although the top level structure of this data is rather simple, the details are very complex and are changing over time. That’s why, in this process, a lot of data modelling takes place. Data is published in “almost” raw format as well as in a “more refined” form.

187. As you can see from the mapping to the CSDA capabilities (the lighter boxes), there’s most often an n-m relationship, that is, a process step implements multiple capabilities and the same capability is used in multiple steps.

Figure 17: Example StatCan UseCase

E. Use case: Privacy Preserved Data Sharing (Stats Netherlands)

188. The second use case is from Statistics Netherlands and is called “privacy preserved” sharing of data. Data from two sources (one of which is CBS) are brought together and integrated without disclosing the identity of the persons described by the data to the other party. A lot of complex encryption, digital signing and secure communications takes place, but conceptually, we can map the CSDA capabilities involved.

Figure 18: Example Stats Netherlands UseCase

Space shortcuts

Page tree

A. Data Collection

B. Processing

C. Dissemination

D. Use case: Data Collection (StatCan)

E. Use case: Privacy Preserved Data Sharing (Stats Netherlands)