The goal of Machine Learning is to build a predictive system based on a statistical approach. Doing so gives you correlations and one thing that is very important to remember is correlation is not causation. The holy grail of any science is about finding the causality of things, and from data you will never get causation. In other words if you are purely data-driven the misconception is if you have enough data you will be able to solve the issue.
I next asked if Khai would give a practical example (one from the customers he could talk about) and tell us what that customer brings to ThinkingNode, what ThinkingNode does for that client and how has that customer been able to apply it in their domain?
One company they work with is J. Craig Venter Institute, and one of their projects is to better understand in details why some genes are essential for life and they are using their own knowledge to do so. The combination is enormous so by uploading the knowledge they were able to combine all of these different concepts and explain why some genes are essential for life.
This is a something very important: reasoning computing has existed for a long time but the existing technology cannot scale. And one of the reasons it cannot scale is because they try to reproduce how the expert solves the problem and when you do that you get an explosion of combinations.
Instead with ThinkingNode, they are focusing on how the expert organizes his or her knowledge, not how they solve something. They let the system solve it, they just take the knowledge piece by piece (either through the experts or specialized database) and let the system combine it because as a human we can only combine 5 to 9 concepts at the same time. This approach is enormous because there is so much available knowledge that we have yet to tap into.
Khai tells us how in this scenario his team extrapolates which genes are necessary for life, and how this is different from just standard mining data. First when you have a correlation in which you have an A that is correlated to B you don’t know if A is the cause of B or B is the cause of A or they are just there by coincidence or they depend on another factor (confounder). So if you want to intervene you don’t know where to intervene because you have no notion of causality.
Something we do all the time but don’t actually realizing it: by combining existing knowledge you can infer new knowledge.
He gives another example of car insurance: decades ago, if you have a red car you may have higher insurance rates because there is a strong correlation with the color red and the number of accidents. But a red car is not the cause of having more accidents – it is because more sports cars are red, and so on. In biology, you would want to focus on the wrong factor if you only trust correlations.
That’s not to say he is dismissive of machine learning, he says it is amazing and he will continue to be a part of it but he’s saying it is not enough. In certain domains, such as life science and other areas where you have a lot of knowledge, you need reasoning computing to have a causal model to work on it.
We go on to talk about basic tools he recommends for reasoning computing, machine learning and why this type of work is so important to him and whether AI is mostly harmless.