Thank you to BukaTalks for inviting me
First, let me introduced myself. My name is Fajri Koto
I was a Data Scientist
And I would to share with you about my carrier journey as Data Scientist
Currently, I am a grantee of Australia Awards for Informatics Technology Program 2018
OK. On 2017 LinkedIn published reports on emerging job of the year 2017
As you can see, job as Machine Learning Engineer, Data Scientist, and Big Data Developer they are growing, more than expected
For Data Scientist, it's rate of growth is 6.5x
Can you imagine the demand of Data Scientists from the companies?
If we take a look of report published by Forbes
As reported, since 2012 demand on Data Scientist has grown over 650%
And around 35,000 people with Data Scientist skill in the US (now)
More surprisingly…
McKinsey claims that in 2018 there will be 140,000 -190,000 job vacancies for Data Scientist
Amazing number
Then what about Indonesia?
Based on my personal experience
I see that from around 2013 to 2014
Data Scientist had just grown in Indonesia
The problem is that human resources of Data Scientist still low
And the demand of the job is already quite high
Talking about data science, let's define what it is...
So, I'm taking from "Doing Data Science" by Cathy O'Neil and Rachel Schutt
Overall, data science is how to process a data
Any raw data that taken from anywhere
As seen here, it can be from nature in from of audio, image, text or even bank transaction data
Which then processed as such resulting a clean data set
That can be use in model, algorithm, exploratory data analysis
For the objective of getting insight or value to improve the company's business
Thing that interesting here is, the uncertainty factor
There's a lot can be done in the process of data pre-processing, data set cleaning and machine learning
This is where data scientist role needed: to produce model, or value expected (from the data)
When did I learn about this, actually?
It taking me back to when I learn about data science in my bachelor degree
End of 2012, I graduated from Computer Science, The University of Indonesia
Have no idea about machine learning and all
Actually, I did my final assignment on MBD (model-based design) System
Scrambling with micro controller, AVR. For more specific, if I not mistaken AVR ATX 256 A3BU
Then the next year, I thought, is there anything more challenging that'll be good for my carrier in the future
Then I decided to take my Master Degree at the same university
Focusing on Information Retrieval Lab
Along with foundation courses: machine learning, advance machine learning, data mining and data retrieval
Around this year, I'm getting to know about researching field
Before, we talk about the uncertainty in data science process
So, Research we can't be detached from Data Scientist
While getting my master, I had an opportunity to do research at NARA Institute of Science and Technology
To be exact, I joined with Augmented Human Communication Lab
Working on research about speech technology
It didn't cross my mind whether it have anything to do with data science
But in fact, it is, they are strongly connected
If we look back in data science process
Then raw data (of the speech technology research) is voice, audio
And then back to Indonesia, I started work on my thesis
I decided to work on Twitter Sentiment Analysis
Which now maybe, is a common thing to do at every companies
If you don't familiar with it, Sentiment Analysis is a classification task for text processing
To classified if a text has a positive or negative content
As simple as that
But then a research must have something new in result
So, I'm looking for things can be done
To boost the performance of the Twitter Sentiment Analysis based on established state of art
Then I started research on the pattern on the sentence's content
And did a comparative study on features published by other researcher (on the subject)
Also, I tried to publish an emotion dictionary that can be used to classifying sentiments
Other things I did during my Master, I did a mini research in my Advance Machine Learning class
At the time, also didn't cross my mind whether it have anything to do with data science
It's more into how to handle unbalance data when doing a supervise classification
Overall, these are my research outcome after visiting NAIST, mostly related to Speech Technology
So, I finished my Master in 2014
Accidently, my CV matches Data Scientist career
Which is a booming thing in Indonesia at that time
And that is when it all start
Let's talk about skills to develop if you want to be a Data Scientist
First, programming. You have to (have that skill). You have no choice, no shortcut
The base foundation (skills) are mathematics and statistics
It will give you great support in doing research
Because we open (read) a paper, most of it will be about formula and statistics
Then there's Machine Learning. Now, it's your best friend.
Research skill, most important or most essential skill of being Data Scientist
So, let's talk about research skill
When you a Data Scientist, the company will ask you to solve problem A, B,
And they want us to work fast and don't want to wait for us doing research
So, the function of research skill is
The ability to look for a better way (in problem solving)
Let's say we're dealing with spam detection system case
So, we got a comment section
With a machine learning model, we can filtered
Whether submitted comment is an advertisement, had a sexual, or other inappropriate content
If we don't get use with doing research
We usually going to deal with basic methods
Or in this case, for text processing is anagram model
Nowadays, with python programming
Several codes will already be able to achieve our objective
The problem is, we have to give more to the company, right?
So, as the scientist, go back to the journal
Read, implement, and evaluate
If you think you got a good model, then implement in your production
OK, another case study, about Spam Video Detection
Let's say, here's the company requirement
As a Data Scientist, we ask to make a system that
When a user upload a video, it will go to video uploading system
Still with the same objective, filtering spam videos
Can you determine what set of skills we needed?
You have to code, understand data base, how to build an API
Accessing server, etc.
And you have to do a lot of experiment to prepare your machine learning model
So how can I develop these skills?
So, I tried to categorized beforehand
First category, if you are a student, coming from computer science, informatics or another IT related major
Please take machine learning course, and do a thesis with research
Because it will help you if you do choose a carrier as Data Scientist
Second category, if you are non-IT related student,
First thing, make sure you're able to code. It's the most essential.
Second, do a research thesis about machine learning, it can be text processing, image processing, etc.
If you are not a student at all, meaning you already have a carrier other than Data Scientist
If you already had an IT related background
There are several options of self-learning
Online courses like Cousera, Udacity
I believe you already familiar with all that
But the most important thing is seriousness and dedication in doing your learning
Most likely we do it halfway, end up not finish the course
Or pursue Master(degree) in Computer Science or Data Science
Or take informal courses such ICRA, DSI Boot-camp, or others that now much provided
For non-student that also come from a non-related IT background, it's quite the same
First of all, make sure you're able to code and want to understand algorithm
Next thing is, how to improve my skill set?
learn more by trying new things
Different data base, different languages
(learning) Data visualization tools
Now that we have Kaggle for practicing
And what also important, is to improve your negotiation skill
It important to convince the people of the output of our insight (as data scientist). 133 00:13:09 --> 00:13:13 So, now you're ready to join a company
Thing to remember is, be sure that their project match with your passion
Data science intersect with artificial intelligence,
(Also intersect with) machine learning. And machine learning can be applied to many things and context
Can be audio, such as Speech Processing
Like, if you ever use Google's speech recognition
There is machine learning text with many variations
Summarization, translation, classification task
Then video, such as Video Spam Detection, if you're into image processing
And last, others such as transaction data for fraudulence detection, etc.
So why do companies need data scientist?
First reason is to boost up their business
And Data Scientist can help them on that
(data scientist) can do a visualization, to help other team make their decision
(data scientist also) can do a daily mail report
(data scientist also) can use AI to support the product
Data recommendation or automatic dialect system just like kata.ai had.
OK some tips and conclusion
If you want to be a Data Scientist, starting from now, love research
Attend relevant conferences, if you want
It helps you update in research development
Or, it even better if you can submit a paper into conferences
And it's a good idea to pursue a higher degree
Last, I want to say: the more you read, the less you know nothing
So, stay hungry stay foolish. Thank you everyone.
Không có nhận xét nào:
Đăng nhận xét