Big Data: False Positives
Posted on:Big data is not just the amount of data, it’s more than that. Big data is a massive volume of both structured and unstructured data. It’s about the availability of data. The data is growing 50% or more each year. To make sense out of these large volumes of data, we use data mining techniques and special tools for analysis of big data. Big data is rapidly becoming the next frontier for new innovations and decision-making in industries, politics, public health sector and various other sectors.
Now, can we trust in Big Data solutions and/or results? Nowadays, are we running after the noise instead of signal? Big Data has its limits.Not just in public health but many other sectors systems collect large volumes of data and there are more chances for false positives. Without considering this fact the models that we create for decision-making or predictions will lead to overfitting or underfitting. Also, false positives challenge the credibility of a system. For better predictions and decision making
The more data will give more information at the same time there are chances for more false positives especially when you look for correlations in the data. More data gives you more witnesses but that doesn’t mean that you are closer to the truth. There are always chances of false positives. But this is when human intuitions will become useful.
So, I think to an extent, we can solve the false positives issue by the repetition of tests or analysis. Comparing data from other data sources using record linkage is another alternative. Integrating information from multiple data sources, interoperability between systems are all measures that needs to be taken for big data that is good data. Addressing the false positive issues will help to convert the big data into solutions.