In the last few years, the amount of data being generated every day is overwhelming, which led the International Data Corporation to estimate that, by 2020, the worldwide data would increase up to 40 zettabytes [1]. Besides, in May of 2017, the journal The Economist published an article stating that data is now the world’s most valuable resource [2]. Most of the time, the usable information is hidden in the raw data and because of that, the demand for people capable of working creatively with data has led to the consolidation of a field called data science. As some are classifying data science as the sexiest job of the 21st century [3], in recent years it has been found that the job offer in this area exceeds demand. As a consequence, companies are hiring data science workers regardless of their academic and professional backgrounds and the impact of this heterogeneity in their data science workflow is yet unknown and understudied, which makes the development of methodologies and tools more challenging and error prone.
I am an Informatics Engineering MSc. student at the University of Minho, Portugal.
My research is focused on data science workers, namely on understanding their academic and professional backgrounds, their skills, and the technologies they use.