It wasn’t an overbooking problem. United Airlines was trying to move four flight crew members to the next airport. They forced...

It wasn’t an overbooking problem. United Airlines was trying to move four flight crew members to the next airport. They forced passengers to get off the plane with the consequences we saw on the video from last Sunday, but don’t take our word for it. Let’s talk data.

An elaborate AI, like this, is not necessarily to answer to maintaining or improving the airline business, however, a real-time analytics platform and a continuous process of data integration would be useful.

Several conclusions could surface even before allowing passengers to board the plane. The amount of people boarding at any time and the amount of required crew for any flight and its location would allow one to run simulations and identify potential bottlenecks.

In lieu of the forthcoming critique, I’d urge you to turn to the realtime stock chart below and generate your own opinion.

Disruption management

helps solve irregular operation problems due to, for example, bad weather conditions or computer failures. It’s a multi-variable problem. Weather conditions, as you could imagine, play a big role, but it is not the only factor. Big data and data science will have a tremendous impact in the following years. The industry requires better information to make better decisions.

A few implementations of DS to consider:

  • Machine Learning (ML) algorithms handle multi-variable problems well.
  • An orchestration of ML models deployed as services can be part of a data ecosystem.
  • Continuous data integration processes is a element. Real time data is a must in this architecture and can deal with potential problems.
  • The use of simulations can help better provisioning.
  • A replicated database, present in the main regions with instances near the airports or connected through a good bandwidth.
  • An ecosystem of data services towards a better customer service and efficiency.


Microservices Architecture

It is ok to have different systems for different purposes, but what you should be doing is working on orchestrating data flows.

  • The booking system can be a central component of the data architecture.
  • A good API system to communicate services with other systems and components.
  • Transactional data from the ERP or CRM systems can feed the ML models.

It’s okay that the booking database is big, there are several big data solutions to manage big data volumes and low latency requirements. A shared-nothing distributed database can keep the data centers updated. If there is a new reservation, the instances of the database will received a notification. Even with eventual consistency you will know when you are close to having a full flight.


Analyzing customer’s data.

You can identify which past passengers accept changes in their flights and how often, then consider overlapping items like day of the week and time of the day, their frequent destinations. Then, start to unpack customer lifetime value, their behaviors and preferences. Targeted promotions can benefit from all this information.

A machine learning algorithm can predict cancellations, delays, or if a passenger will not show up. Real-time analysis can tell if a flight has a risk of overbooking based on historic conditions. Using inputs from other systems like the crew scheduling system is an opportunity to make the data more robust. These techniques requires the integration of several data sources… an ongoing flow of inputs from the status of other flights, airports, customers, and ML results. Based on important variables to the company, prescriptive analytics uses the scores determined through ML models and other data to recommend best course of action. A Company’s strategy guides the prescriptive analytics deployments. Naturally, this is a quite a challenge, but the current offer of big data technologies makes it possible. Using a combination of batch and streaming analytics platforms. For example, implementing a lambda or kappa architecture.



Of course, a better data architecture is not the silver bullet to all the problems. These data driven technologies would have helped to avoid the incident. But in the absence of volunteers to change flights and other preventive measures, a simple random selection without replacement among the passengers including non-essential flight crew would have done the trick. Before they get to that level of implementation, passengers shouldn’t be forced to abandon the plane due to poor company planning.


Diego Arenas

Diego Arenas, ODSC

I've worked in BI, DWH, and Data Mining. MSc in Data Science. Experience in multiple BI and Data Science tools always thinking how to solve information needs and add value to organisations from the data available. Experience with Business Objects, Pentaho, Informatica Power Center, SSAS, SSIS, SSRS, MS SQL Server from 2000 to 2017, and other DBMS, Tableau, Hadoop, Python, R, SQL. Predicting modelling. My interest are in Information Systems, Data Modeling, Predictive and Descriptive Analysis, Machine Learning, Data Visualization, Open Data. Specialties: Data modeling, data warehousing, data mining, performance management, business intelligence.