The importance of BI-ing audited

Building a Business Intelligence (BI) solution could be compared to constructing a building. We need to build this solution on a solid foundation, which allows us to build different projects, with the security of being based on a structure that allows us to grow.

But, if everyone accepts this statement in construction, why are there organizations that opt ​​for another strategy when it comes to building a BI solution?

Building on a weak foundation

Some time ago, I worked in a project for a client who made a serious mistake. I called this project “The Leaning Tower of Pisa”.

This client contacted me because in that organization they had several problems with their BI solution. The most important one was functional. They could not get correct results. So the client wanted an expert consulting service, which would allow them to identify and solve existing errors. They had tried to solve it by asking the consultancy company who had built the BI solution to do it, but they were unable to find the solution to these problems.

Given the urgency of the client, my objective was focused solely and exclusively on the resolution of the issues detected. Therefore, I set aside my usual top-down approach to focus on the problems identified. However, I could quickly identify that the problems causing these incidents had their origin in design flaws of the existing BI solution.

I decided to write a report detailing the implications of these system design errors along with a Pros & Cons analysis of maintaining or solving those errors.

I also stressed the fact of having built a BI system without any validation of its quality. They had not made any kind of inquiry into previous clients of that consultancy company nor had they opted for the validation of the design of their service provider by an independent expert. They had accepted the project blindly. And it was a project with transversal visibility in the organization, where the impact could affect decision making in all areas and levels of the organization.

A risky decision

They were building a giant with mud feet. At that time, I gave them the example of the Leaning Tower of Pisa. They were at the point where they discovered that the tower was not straight. It was time to decide what to do. If they went ahead, they ran the risk that the BI solution would be impossible to maintain in the future, having to make a large investment to rebuild it. If they decided to solve the foundation issues, they would have an extra cost not being budgeted, but in the long run, the cost would be much lower.

My client wanted a quick solution. This is why they chose to go ahead with the resolution of the issues detected initially, without solving the design flaws of their BI solution. So, I focused to solve them, not before warning them again of the risk that this could entail in the future, in the event that they wanted to expand the existing BI solution.

A wrong decision falls under its own weight

I think it was after a couple of months of finishing the service for which I was hired, that I received a call from the same client. They were desperate. They had a new BI project under development, but there was no way to make it work. Neither their usual service provider nor their internal BI team were able to move forward the project. When they tried to implement a new functionality, they caused other existing functionalities to stop working. They were in a dead end. The Leaning Tower of Pisa could not grow anymore.

This time, my client opted for a different approach. They asked me to evaluate the possibility of finishing the project, but also to estimate the cost involved in rebuilding the whole BI solution in order to have a scalable system that could easily grow in the future.

When presenting my proposal, I stressed the importance of evaluating long-term costs. That had been their big mistake when they initially contacted me. In a BI project, it is very important to take into account future projects when calculating the Return on Investment (ROI). This time they did it. And they came to the right conclusion in their situation. They decided to rebuild their BI solution.

Such was the confidence I got from the client that, instead of hiring the services of a larger consultancy firm than StraBIA, they decided to entrust me with the project to rebuild their BI solution. That was the beginning of a satisfactory professional relationship.

Conclusion

A BI solution is an information system with a wide visibility in an organization, since it is used for decision making at all levels and in all business areas. Being such an important system, it is a must ensuring that it meets the required level of quality. This is even more important due to the fact that it is a type of project with a very low number of experts when compared to application development projects, for instance.

Everytime I’m in charge of maintaining or expanding a project in a new client with an existing BI solution, I spend some time to analysing it, identifying areas for improvement. I know that by doing this, some of my clients have avoided unpleasant situations in the future. That encourages me to continue doing it, even if StraBIA does get the fix and improvement project.

Fortunately, this is not always the case, and sometimes, like in this case I just told you about, we are assigned the project. And in these cases, an audit of the new BI system would evaluate the new solution with a very good mark.

I would like to finish with a couple of questions:

  • Do you think that your current BI solution is ready to grow while maintaining the levels of performance, effectiveness and maintainability required by your business?
  • Do you think that your current BI solution would pass an audit?

If you have answered “No” to any of these questions, I hope this article makes you think of the risks that you may face in the future.

¿What is Big Data?

 

Big Data is fashionable. It is interesting to see how the vast majority of people have heard the term Big Data at some time even though they do not belong to the business or technological world.

But it is also very interesting to hear the great variety of definitions that emerge when people are asked about Big Data. In this article I am going to try to solve a doubt that many of you have: What is Big Data?

 

Big Data definition

Big Data is a set of techniques and technologies that allow data analysis.

These techniques and technologies allow us to store, transform, analyze and visualize data efficiently. And thanks to this, we can meet the present analysis needs of the organisations, with a level of demand much higher than a few years ago.

That is, Big Data needs to be used in scenarios where a traditional BI solution (used for data analysis) is not suitable to meet the required analysis objectives.

Definition of a Big Data scenario

Big Data should be used in situations where data analysis is not possible efficiently using a traditional Business Intelligence (BI) solution. These situations have historically been associated with what is known as the 3 V’s: volume, velocity and variety. Some people include other V’s in this list such as veracity, volatility, validity, visibility and variability, but the most common definition is still that of the 3 V’s.

Volume

Massive data volume means a very high amount of data. When massive data exists, it can no longer be handled efficiently by traditional data repositories. These are, in the vast majority of cases, relational databases, which, despite having evolved in the recent years to be more efficient and to run on more powerful hardware than before, are still a bottleneck for the efficient storage and query of large volumes of data.

The use of this type of storage systems for analysis of large volumes of data can take them beyond the limits for which they were designed, producing a decrease in performance when storing and accessing data. These limits vary depending on the hardware and software, so it is almost impossible to draw a line to delimit the beginning of massive data. A few years ago this limit was of the order of gigabytes, while today, with recent innovations in hardware and software, it’s around a few terabytes.

Velocity

When someone analyzes data, it does it with the aim of finding an answer to a question, within a timeframe in which that answer will bring some value. If that answer arrives late, it lacks all of its value and the opportunity is gone.

For example, analyzing vehicle and mobile devices location can provide information on traffic flow. In this scenario, the question we want to answer could be: “At what speed are vehicles moving?”. If the vehicle and mobile device data could be obtained and analyzed in a very short timeframe, it would be very useful, since we could visualize the data in a map to offer “updated” information of the traffic density in each road (urban or interurban). However, if this answer is obtained one hour late, it will not be useful to the drivers.

Therefore, it is clear that velocity is a key factor when making decisions.

This velocity to obtain an answer from the data can be broken down into two components: the data loading speed (obtaining, processing and storing) and the speed of information analysis (extraction of knowledge through data analysis techniques such as statistics or artificial intelligence).

If any of these components is slow, there is a risk of exceeding the upper bound for the response time, which will result in no value to the user.

A traditional BI system, due to its design and architecture, has a delay in bringing the data into the repository which usually ranges from a few minutes (in specific cases such as Lambda architectures) to 24 hours (in a scenario of daily data loads), although it could be higher. If we take the previous scenario (traffic), a traditional BI clearly could not satisfy the requirements to have the information updated in near real time.

Variety

The traditional data types used to store data are three: numeric, character strings and dates. Historically, when there was a need to analyze data types beyond these, specialized applications were used, which were outside of what are considered BI tools.

For example, for years there were applications and libraries that allowed analyzing images and being able to obtain answers to questions such as “Does a green color appear in the image?” (which could be very useful to know the time elapsed in the growth of a fungus in a laboratory culture). But those applications and libraries were not integrated into traditional BI tools.

Therefore, the analysis of data types beyond the traditional ones was not considered, in the past, as something feasible within a BI solution.

Currently, with the growth of data available in organizations and on the Internet, there is an increasing need to find answers from non-basic data types, including audios, photographs, videos, geolocations, etc. When this is a requirement, the use of Big Data is a must.

Differences between a traditional BI and Big Data

Without going into technicalities, the following table tries to summarize the most important differences between traditional BI and Big Data:

FactorTraditionalBig Data
VolumeFew TerabytesTerabytes and above
VelocityPeriodic data loads (typically daily)Higher frequency in data loads → Real time
VarietyBasic data typesVirtually any data type
ComputationCentralized in a single computerDistributed
HardwareHigh specificationsAny (Commodity hardware)
ScalabilityDifficultSimple
Data quality (veracity)Very importantRelative importance (a certain degree of uncertainty is assumed)

Conclusion

Big Data allows us to bring data analysis beyond the traditional BI capabilties. It is a response to the needs of users, just as BI was in the past with respect to older technology. This does not mean that BI should be put aside as a valid alternative when analyzing data. On the contrary, it should always be an option to explore.

However, when the users’ needs include the use of massive data (volume), with responses obtained in a very short timeframe (velocity) or obtained from complex data types (variety), we must discard traditional BI due to its limitations, and go for the use of a solution with Big Data.