Driving tests produce an enormous amount of measured data for different development disciplines. Some companies have therefore begun to collect these data in a so-called “Data Lake”.
At the heart of such a “Data Lake” is usually the open source platform Hadoop. It provides a variety of frameworks that enable to process and analyze the incoming data volumes flexible in manifold ways.
The difference to classical systems is that the data is processed in parallel, which means distributed over many nodes of a computer cluster. By this, data is not transported to the servers, which then execute the program code – as usual – but the program code is distributed to the servers in the cluster with the associated data and executed there. The (partial) results are then merged again. As a result, time-intensive data transfers over the network can be minimized. This, on the one hand, has a positive effect on the speed of data processing (= performance) and, on the other hand, allows for a good scalability of the system.
Two prerequisites have to be fulfilled in order to allow a broad group of people to gain access to the “data in the lake” and to generate knowledge that goes far beyond traditional, domain-specific analyzes: A standardized access to the data as well as flexible expandable solutions for their description with regard to the content. This ensures that measurement data in the “Data Lake” can be found repeatedly for a long time and can be clearly interpreted and compared for different purposes.
ASAM ODS provides long-time tested approaches for these requirements: The ODS Data Model (distinguished in Base Model and Application Model) as well as the ODS API. For more information to this, see the ASAM Wiki.
In order to take advantage of the special capabilities of Big Data, it is purposeful to subdivide the persistence of the ODS Data Model as follows:
- Contextual meta data
Describe the context in which measurements were generated. This include the data categories “Environment” (e.g. AoEnvironment), Administration” (e.g. AoTest), „Descriptive Data“ (e.g. AoUnitUnderTest), „Security“ (e.g. AoUser) and „Other“ (e.g. AoAny).
- Technical meta data
Describe the content and structure of individual measurement files. This include the data categories “Dimensions & Units” (e.g. AoUnit) and “Measurements” (e.g. AoMeasurement).
Due to the subdivision, different approaches (= modes) for the storage of meta and mass data can be followed within a Big Data project. The following figure gives an overview:
The so-called “Mixed Mode” is widely used in classical systems (see e.g. How FEV handels large amounts of test data). With the “Extended Mixed Mode“ ODS-clients have standardized access to mass data in a Hadoop Distributed File System (HDFS). In this case the ODS Server has to manage the contextual and technical meta data, which is stored in a relational database. The access to the individual measurement files is done via scalable Big Data services, e.g. based on Apache Spark or Apache Drill. Among other things, these services act as a kind of “external file format server” for file formats like Parquet. However, for a very large number of measurements, the ODS server can quickly become a bottleneck.
The “Advanced Mixed Mode” also enables standardized access to mass data in a HDFS via an ODS server. But in contrast to the previous approach, not only the access to the mass data is done via Big Data services but also the management of the technical meta data. This results in an improved performance and scalability for the access to mass data.
The „Real Big Data Mode” goes one step further. It allows, on the one hand, the standardized, high-performance access to meta and mass data in a HDFS by scalable Big Data services via an ODS server and, on the other hand, the interpretation and processing of test data in a big data environment completely independent of an ODS server. From the point of view of Peak Solution, this is the basis for truly new, Big Data driven analytical methods and solutions!
An overview on how ASAM ODS can be gradually extended to Big Data is shown in the following figure:
Accordingly to the figure, the following ToDo’s result for the ASAM e.V.:
- Specification of a big data friendly storage format for the storage of mass data
- Specification of a big data friendly storage format for the storage of technical meta data. Suggestion: Storage of technical meta data and mass data together in one file
- Specification of a big data friendly storage format for the storage of contextual meta data
- Specification of Big Data services (methods and business objects) for accessing (reading and writing) mass and meta data
- Specification of an API to connect the Big Data services via an ODS server
It remains to be seen if and at what speed the current “ASAM BigODS Working Group” is willing and able to implement the listed ToDo´s. To make rapid progress, Peak Solution suggests to divide the work evenly between different service providers. This also avoids dependencies to individual suppliers and ensures a broad consideration of different requirements.
Peak Solution will report on the progress of the ASAM BigODS project in this blog at a given time.