- One use case is
to use BigQuery for Preprocessing
of the training data.
So the TensorFlow or the Machine Learning Engine
or ML Engine itself is not suited
for the duty of preprocessing
or collecting all the data from Enterprise Systems.
So you can use BigQuery as a data bank
so you can gather all the data
coming from the Enterprise
application server, with server databases.
All imported to the BigQuery and do some preprocessing on that,
then you can add exported data, other training data
to the TensorFlow or ML Engine.
So let's take a look at the actual demonstrations for that.
Here, I'd like to use the Cloud Datalab
to do--to use the BigQuery for preparation,
or preprocessing and use the TensorFlow
to train your Neural Network model
for machine learning analytics.
So this is just something I have prepared
that is called Classifying Manhattan.
Here we are importing the bunch of the location data
from BigQuery and trained the Neural Network model
to classify whether each location is Manhattan or not.
So...
We clear all the cells first.
And now let's take a look at what kind of the training datas
that we'll be using by executing this query.
So I'll be using the--
another public data set called NYPD Collisions.
That because all of the car accidents
happened in the New York City.
We see timestamp and borough and latitude and longitude.
But you cannot use this raw data as it is.
You have to do preprocessing to that.
Because you can see that some rows
doesn't have any borough data.
Some rows doesn't have latitude and longitude.
We have to remove
useless garbage data.
So again, you can use the BigQuery
to do this kind of the preprocessing.
And also, I want to have a new program.
Is Manhattan -- to have a flag.
That flag represents whether the latitude and longitude
is inside Manhattan or not.
So this is the SQL I'll be using for preprocessing
and let me execute it.
And also, you can also execute another
called to extract all the data from BigQuery,
import it into the Python.
So now you have all the data imported into Python code.
So as you can see, by using Cloud Datalab,
you can seamlessly integrate the BigQuery query
with the Python code you write.
And you can continue use your scikit-learn
or the NumPy code running on BigQuery.
Sorry about my voice.
So now we have the training data set.
One is Manhattan.
This could be used as a label for training,
whether each location is in Manhattan or not,
and pairs of the latitude and longitude.
10,000 rows.
So let's do the feature scaling.
Before starting training, usually in machine learning,
you have to clean the data out.
Clean out--cleansing of the data.
And one important thing there is to do is feature scaling.
And if you do the scaling and plotting the data,
you can see the training data set would look like this.
Now everything is centered on...
The, uh...zero.
But still, you can see the shape of Manhattan here.
You can still--
You can even see there's Central Park inside the data set.
So this will be the training data set for neural network.
We'll be splitting the data into training data
and test data.
And use TensorFlow to define your neural network.
Here I'm using TensorFlow, especially the so-called high--
what is that? High-level API of TensorFlow.
And this is one of the features we have recently announced
with TensorFlow 1.0
where you can just write a few lines of Python code
to define neural networks.
Before that, we had been using so-called low-level API
where you have to define all the competition graph--
low-level competition graph to define your own neural networks.
But now you can just write a few lines of code with Python.
So you can see we are defining deep neural networks
with the four hidden layers
and each hidden layer has 20 nodes.
We have the neural networks.
Let's check the current accuracy
of the neural networks and how these neural networks
can classify Manhattan with that,
before doing the actual training.
So now you'll be seeing the map
classed by the neural networks.
So this is how a neural network thinks
where the Manhattan is before training.
So because this is before training,
the neural network is stupid enough--
is too stupid to classify Manhattan.
He thinks this is Manhattan.
So this is where we have to train the neural network model.
The accuracy is 75%.
So you can write a full loop
to repeatedly call the fit method
to train the model by using the training data set.
So now it should be showing the--
gradually showing the better accuracy.
So you are seeing that neural network model is trying
to adjust how to classify each data points
to get better result.
So that network is getting much smarter and smarter
as you have saw on the Playground demonstration.
Yes, it's getting much, much better.
So now training is finish.
We got the 99.8% accuracy.
And now neural network is trying to split the Manhattan
and Brooklyn with the very sophisticated code between them
Không có nhận xét nào:
Đăng nhận xét