Car manufacturers and suppliers facing the challenge to handle a growing amount of data coming from various sources such as test stands, test automation systems and field test equipment in an efficient manner. At the same time, they want to link development and test data with information on vehicle use. The goal is clear: Due to the increasing complexity of vehicles, the engineers must get a holistic, cross-domain understanding of the interaction and behavior of individual car components and performance parameters in the context of different environmental situations. In doing so, they want to make use of existing experience and knowledge in the company or even in the industry. Therefore, the companies want to collect, refine and systematically provide the entire development knowledge for different user groups.
The first step in this direction is importing the massive amount of measurement data from day-to-day testing (e.g. time series and car bus data) into a powerful and scalable platform (= Data Lake). Companies do this either by batch processing or streaming. The latter is a prerequisite for timely predictions about test sequences and, if necessary, the initiation of process changes (see below Monitoring and Reporting).
The imported measurement data must be stored in file formats, which are suitable for high-performance processing and analysis. The intended use cases primarily drive the choice of the file format. If e.g. query performance for data is most important, Parquet is a good choice – but these files will take longer to write. Avro is great, if the schema is going to change over time – which is usual for context data.
It is important to store the measurement data in conjunction with contextual information. Context data is information about the unit under test, test environment, test setup, etc. It is vital for interpreting the measurement data. Without context, measurement data are just numbers. Usually, the test context setup happens during the test-commissioning phase (see: How to document tests in a standardized way).
During data import – in a preprocessing step – measurement and context data may be checked for completeness and consistency. This assures that the import do not produce corrupt data or invalid relationships in the system.
If it deems necessary to optimize the storage for later search and analyzing tasks, this can be done also during import. It might include decoding of time series and bus data, video indexing on predefined event detection, and the like.
Once the data is available in the system, they may be linked to data from other systems (e.g. weather, traffic, etc.). They may also undergo certain time series calculations to create additional signals or compute statistics and key performance indicators (KPIs) regarding individual components.
Monitoring and Reporting
Based on KPIs, the responsible can get an up-to-date overview of the running (road) tests. This gives them timely the opportunity to initiate corrective action if necessary. For example, the number of specific error messages of a control device in the context of the test (traffic, weather, location, speed, temperature, etc.) or during a certain period, respectively mileage can be monitored. The results are provided either interactively via dashboards, e.g. as a histogram, or regularly via a certain automation.
After the test data have been stored, Test Engineers and Data Scientists want to find, select and compile it, based on context data as well as mass data. For example, they want to use extended queries to search for events in which the values of a specific channel exceeds a certain limit over a defined period. Here, channels from a large number of measurements in the past have to be compared with one another. The search algorithms must be easily applicable, even without programming knowledge. For reporting purposes, the search results should be persistent.
By means of suitable data analysis methods and tools, engineers would like to discover patterns and correlations in the data. To determine whether the found data correlations are actually present and statistically significant, they express the underlying assumptions and hypotheses mathematically. This is done e.g. by if-then conclusions, complex logic formulas, correlations between variables, decision trees, etc.
Once test engineers have discovered certain correlations via search and analysis, they want to utilize machine learning or other algorithms to build analytical models regarding cause-effect-relationships. This shall help product engineering in identifying solution alternatives and insights. It includes reports on the applicability and accuracy of the models. By reusing the models and applying them on new context and mass data, engineers can us existing engineering knowledge in the long term.
Access right management
In order to ensure the confidentiality and security of the test data, access to it must be limited. This may also include rules for their validity and proper deletion or anonymization.
The above-mentioned requirements show that development and test engineers of automobile manufacturers and suppliers bring the term Big Data in conjunction with various requirements: It involves the storage of large amounts of data, the organization of data management, the analytical processing of data and much more. Some companies in the industry have already begun to deal intensively with the topic. They have settled for themselves what new methods and technologies are available and how they can best benefit from it. A small but growing group of pioneers has emerged, who are gradually taking advantage of Big Data’s potential for vehicle development in the context of prototypical implementations. Peak Solution provides them with the necessary professional and technical expertise.
Basis for the content of this article has been personal discussions with test managers as well as a requirement analysis of the ASAM ODS Big Data working group in 2016.