Wednesday, March 26, 2014

F1 and Big Data

This combination seems like an obvious love affair, right? But in order to answer that question, let’s have a deeper look at the participants.

Posting this in an F1 related source surely means you know what F1 is like. Big Data, however, is a lovely term that has reigned the Internet quite fiercely these days. Certainly, as the time advances, we are likely to hear it more and more, simply because the digital data arrays we accumulate are becoming larger. I will try to simplify the explanation of “Big Data” - it’s a term describing massive and complex amounts of data which traditional tools are unable to process. Real world example - each engine of a jet on a flight from London to New York generates 10TB of data every 30 minutes – just to get a practical grasp of what this really means. Although most of these operational information arrays are lost after flight completion, the companies are looking at various ways to collect, extract and analyze the meaningful bits.
This situation would require special handling. Other example of Big Data is Google’s collections of websites information, which cannot be just inserted inside a normal database. Fortunately, there are already standard parameters that define the most important characteristics of that domain. They are called the 3V - Velocity, Variety and Volume.

I’m not really keen to delve deeper into the IT side of the problem, but I’m likely to use them throughout the text as references and for consistency.
So, how does F1 and Big Data get along? That relationship has been really helped by the standard data acquisition package we have today, provided by Mclaren Electronics.
Let’s display some numbers (which are likely to be updated once 2014 season unfolds technically)

  • The F1 car has around 300 sensors streaming data back to the garages. (Variety)
  • Around 750 billions pieces of data are sent in total from all cars during the race weekend (Volume)
  • The raw unstructured data collected for one single car over race weekend is around 15 GB. 
  • The peak data transfer (throughput) during the race is about 2 MB/s. (Velocity)
  • On average, for every lap, a Formula 1 car produces 35 megabytes of data (Source: Ferrari)

That is certainly quite a lot of data to look at. Add to that equation the data flowing from the strategy group and you’ll get the big picture. Teams definitely need a refined and quick way to go through that data jungle and make quick decisions. The emphasize is really on “quick”, just in line with the fast sweeping nature of Formula 1. That ability to mine, analyze and present meaningful data out of the big array has proven to be very important for the teams. How do they cope with the data pressure?

One of the oldest examples I’ve got on the list is Mclaren. Everything from Woking screams: “Hi-Tech!”. Back in the days, the team has started a partnership with SAP, the technology giant. This move also involved one of the most prominent technologies in that domain, called HANA. While this is purely a commercial product, its core strengths lie in the memory. HANA is a mixture of acquired products, eventually making it into extremely fast analytical engine, built over column-oriented, in-memory database. This type of technology allows very quick mining through large datasets, which is what an F1 teams needs. SAP HANA enables McLaren’s existing systems to process data 14,000 times faster than before.
The attempts that Mclaren and SAP had resulted in something which looked like that: 

Click on the image to get the larger version

As you can see, this is a prototype dashboard displaying all vital parameters for both drivers. This would be an ideal tool for engineers - essential, data-rich and clean and readable, at the same time.

We have the drivers on both sides, separated by the track layout and the current position of both cars. The data for every car is easily distinguishable by the colors - blue for Button and yellow for Hamilton, in this showcase. The entire screen is actually a live application and data changes automatically as the cars are progressing. As you can see, there are all vital parameters available, along with current tire compound, the life of the set, the pressure of all four.

What is very intriguing is the Predictive Timeline at the bottom of the screen. This is the module which adjusts the race strategy in real time based on many predefined and historical factors.
The partnership between SAP and Mclaren appears to be progressing, so we are likely to update the content in the future, especially when it comes to the new engines.

If you want to get into details about pit-wall concept, this is the link - http://stillbrandworks.com/sap/pit-wall/

The next stop is Enstone, where a while ago Lotus has announced a similar partnership with iRise. While the details remain secret at this time, an image has leaked some time ago, which can definitely tell us something.


While we don’t know whether this is a prototype, Lotus are apparently looking at fast and convenient way to visualize information.
The screen has three tabs and one main body displaying static information at the upper right corner. On the Setup tab there are car parameters used to set up the car, obviously. Again, we don’t know just how much of these fields are dynamics, but it would be waste of space and time not to be dynamic, right? So, again, this is an example of how data could be harvested and used to display the most vital characteristics of a car or those who are pertinent to the race.

More on Lotus and one of their technology partners - EMC. According to them, the above-mentioned characteristics look like this:

  • Variety - Over 150 sensors logging data 
  • Volume - 50 GB of data per race
  • Velocity - 15 MB of data per lap

Next usage of the big data models is the historical race information. All those little data pieces we see are stored and then subsequently used when making a decision - based on a temperature value, tire compound or front wing angle of attack. Or a combination of all these. 

This is the short story of how F1 teams are handling the Big Data issue and actually making it to work for them.