Is Data Science really a science, or is this just a cute name intended to give some scientific respectability to a mishmash of tools, techniques and attitudes?
As John Searle points out, we should be wary of activities and disciplines that include “science” in their description:
“Science has become something of an honorific term, and all sorts of disciplines that are quite unlike physics and chemistry are eager to call themselves ‘sciences’. A good rule of thumb to keep in mind is that anything that calls itself ‘science’ probably isn’t …” John R. Searle (1986). Minds, Brains and Science (1984 Reith Lectures) Harvard University Press
Richard Feynman makes an incisive observation in his discussion of methods of science:
The key to science …
Data Science is – was – described as lying at the intersection of statistics, computer science and domain knowledge.
This is sort of cartoonish but can be orienting in that it might help you ascertain if someone you run into ought to be described as a data scientist, irrespective of what they may call themselves.
The fact is that most people who call themselves data scientists work in industry doing – data sciencey things, such as – data analysis, building models of and for data, and making predictions from data.
An easy to understand, although complex in practice, example of this is people who work on building, maintaining and refining recommendation systems – the sort of thing that Amazon, Walmart, and other online stores do to recommend to people what else they might buy if, for example, they come looking for a camera. This type of statistical modeling can make a LOT of money for the companies involved, so business is VERY interested in hiring people who know a lot about recommender systems and can put their knowledge into practice.
Well, this is an industry thing and one might argue this is an application of data science in industry. In some sense, however, one would be wrong to argue this – or, at least, not entirely right. That is because data science as we currently know it almost certainly would not exist, at least not under the name “data science” if it were not for its industrial application.
What’s known as data science already existed previously in astronomy: the analysis, model building and attempts at prediction were already in existence in astronomy where big data with high velocity was being analyzed long before anyone thought to call astronomers “data scientists”.
It sorts of goes without saying that whatever data scientists do they analyze data. And people who describe themselves as data scientists can get very tetchy about this, insisting that they are not “mere” data analysts – although those who have an MS in Data Science or a PhD in Statistics ought to bear in mind that John Tukey was proud to be known as a data analyst.
So the prime activity of a data scientist is data analysis, and a lot of other fancy stuff, such as machine learning (including deep learning), model building, and prediction, that goes along with data analysis.
The raw material so to speak for a data scientist is data. A data scientist, in other words, is trying to do something meaningful and useful – hopefully scientific, to justify the title – with data.
The data a data scientist analyzes can come from anywhere in any form. It most definitely does not, and usually is not, the result of a deliberate experiment, and may be a focused, or even ad hoc, collection of observational data.
Data science experiments
Thinking about Feynman’s description of the generation of scientific laws, it’s not at all clear that’s what data scientists do (at least not what the vast majority – who work in industry – do.)
So let’s start at the other end and ask: do data scientists carry out experiments?
Well, yes they do.
Bear in mind that it’s not a critical aspect of science, or of a scientist, that one should regularly do experiments. For example, astronomers regularly collect data but rarely, if ever, conduct astronomical experiments. And Richard Feynman, a top notch physicist, was not known for carrying out experiments.
That said, let’s see how data scientists conduct experiments.