Data Buzz – Volumen #3: Why would anyone want to be a data scientist?

Consider this following paragraphs a kind of self-reflexive humor. Given our 8 years at 7Puentes and some lessons learned, it’s time to talk about our jobs and ourselves as data scientist. That’s why we’d like to give you a summary on all the buzz around becoming one (if you’re not already overwhelmed by it) and maybe share some personal notes (mostly complains and wishes) on the subject. It is strange though, that only a few years ago, we weren’t calling ourselves data scientists, which brings to my mind this question: are we suffering some kind of identity disorder? How can we become something that we were never trained to be?

The curse of being a data scientist is very well described here so there is no point in doing that here at the blog and I will just hit you with the bullets:

You already are a data scientist

If you’re a statistician, engineer, data analyst, economist, computer sciecne PHD or M.S., or even a microsoft excel guru: you’re only one step away of becoming a data scientist. Just follow one of many web tutorials and add the title to your linked in profile! The real issue is to be a good one, but since there are not many available, nobody really cares if you’re good or bad. Maybe in a few years, after there are plenty data scientists to pick from the market, you’ll bother yourself of being a good data scientist.

You can’t cheat and fake results

We expect data to be a measure of a fact and then by analyzing data we will be having strong support to make decisions or to build systems that make decisions/predictions. That is not always the case as noise is always around and even we do our best efforts to stay objective, the business value extracted from the data is always driven by people and therefore always personal and subjective.

You’re data laundrymat

No one estimantes 80% of the time spent in data munging but that’s likely to happen. Sadly, most of the data is raw, diverse and noisy so be ready to clean your armor and sword before going in to battle.

Data and dogs do not talk

Although you can brainstorm with you fellows about whether you should use SVM or ADA to train a classifier you’ll spend the majority of the time alone with the data. You’ll be waiting for some processing to be done but cloud computing and spark against you playing videogames at work (they won’t keep you waiting too long) so prepare your self to repeat the data experiment again and again. That’s why staying focused is so important even in a repetitive scenario.

You have to be patient when no one else is

(Maybe) If you have a big budget and you’re in the R&D department of some company or at a university lab, then you’ll have a pass of some pressure to meet the goals (Maybe again). But in any other scenario, building a good accurate model for a new business (startup) or novel data product is hard so try to stay patient as you try to deliver an accurate model. Remember: “Data is fast, but science is slow.”

You have to study a lot of math

Someone said that you have to choose to be a type A or B data scientist (what????). Just to make it clear for you, if you want to be a type A data scientist and your math skills are poor then you’d better consider the B side of this tape.

You’re not really a rockstar.

There are only a few data scientist that can be considered rockstars. They are the ones who are pushing the boundaries of science by managing strong well funded teams across the glob. Gregory Piatesky compiled a list, if you’re curious to follow them. It is ok to think you’re a rockstar though (I did it when I started playing the guitar) but it is not ok rise above an engineeer just because you know some math our you can really understand a neural network.

All you need is love

In the end, being a data scientist is being passionate about data and problem solving. You’ll be problem shifting for a long time and since that can be stressful, I consider healthy taking some time off and ease the mind for cleanse purposes. Doing sports (soccer for me) is nice way to relax after doing some heavy data lifting.