Hello, my name's Ian, I'm a solution architect with Matillion. Today I'm going
to be showing you how to use the Twitter query component for Matillion ETL for
BigQuery. Let's get started.
Matillion's Twitter query orchestration component
enables you to extract data from your Twitter account and load it into Google
bigquery. It's an orchestration component so the
first thing you'll need to do is create an orchestration job and I'm going to
call mine Twitter Extract and I'll switch into there to start editing it.
I'll need to find the Twitter query component first of all which is here under
"social networks", drag that into place and connect it up with a data flow
connector and then I can start to configure the properties of the Twitter query.
The way this component works at runtime is that Matillion connects to Twitter using
OAuth credentials and you can check this by going to your manage OAuth menu and
make sure that you have an entry in here for Twitter which is in an authorized state.
Now if you haven't set this up yet then now's the time to go and do
that before continuing. Now I'm going to start working down the properties from
top to bottom starting with the authentication and I'll choose that
Ian's Twitter one that we were looking at moments ago. The data source list
includes about 20 of the main elements in the Twitter model including tweets of
course, direct messages, followers, and lists, but for this example I'm going to
choose tweets and this is a good test by the way that your OAuth is working
correctly. For the data selection, that's essentially the columns that you want to
choose, I'm just going to choose the first few out of there and you should
normally put a datasource filter on to these kind of queries but I'm going to
just take advantage of the default seven-day filter so I'm not going to
specify one specially. Continuing down the list I've got to choose a
target table name. Now, this table will get created in Google bigquery
so I'm going to call it stage tweets because it's going to be a staging table
I'll also need to choose a cloud storage area when I have a variable set up for
that purpose which is in a variable called staging bucket so I'll choose
that then. All the properties have configured now so the things to look out
for are first of all a green validation tick in the task console, secondly a row
of green ok messages in the status of all the properties, and lastly the border
around the Twitter query component itself should be green. I could run this
job through the scheduler but since I'm still in testing mode I'll just run it
interactively with a right-click on the background there. Now the run time will
vary depending on what type of query you've issued, what "where clause" you've
put, if anything, but this one's now finished so I'm ready to use that data.
This extract and load task is going to be the first part of my daily schedule
so I'll add it into my daily ELT job by selecting it out of there and connecting
it up with a flow line. Now remember that the bigquery table in the Twitter query
component in here is a staging table and that means it will get dropped and
recreated every time it runs so you'll need to do something with the staging
data after loading it and I'll show you a transformations job which I've built
which does exactly that. I'll drag it out of there, it's the daily transforms job,
connect it up there, and I'll double click in there to edit it. This job works by
starting with a table input from the "stage tweets table" that we've just
loaded and it has an output to a master tweets table with the right disposition
set to append mode. Now I only want to add tweets to this table which don't
already exist in there, so to achieve that there's a left join on to the
target table and the left join is by ID and then after that there's a filter
where the master ID is null or blank. So if I do a sample on that component I can
see one row in there and that's the latest tweet
which has come out of this account and if I run this job in my environment that
will actually push that record into the target table and the left join achieves
a "where not in" or "where not exist" type anti join.
Now that that job has run through once and the new data has been inserted I'm gonna do a test by going
back to that filter and re-pressing the data sample and this time I don't get
any data back because the data has already been moved out and that join is
working this time. So it doesn't matter if I run this job accidentally more than
once I won't end up with duplicate data in my permanent table. So when we check
the daily ELT job we now have an end-to-end data pipeline, the extract and
load from Twitter into bigquery, and the daily transforms which safely moves that
data on from staging into its permanent location.
I hope you found that video helpful in getting you set up with the Twitter query components,
You can visit us at any time at Matillion.com or launch the product
from the Google Cloud launcher
For more infomation >> Como Devemos Pensar Para Atrair O Que Desejamos - Duration: 4:04.
For more infomation >> American Horror Story 7x04 Promo "119" (SUB ITA) - Duration: 0:31.
For more infomation >> இத பாத்துட்டு சொல்லுங்க ஜிமிக்கி கம்மல் பாட்டுக்கு யார் நல்லா ஆடுறதுன்னு | Tamil Cinema News - Duration: 1:25.
For more infomation >> சுவிங்கம் வயிற்றில் மாட்டி கொண்டால் என்ன செய்ய வேண்டும் | Tamil Cinema News Kollywood | TAMIL SCREEN - Duration: 1:33. 
For more infomation >> அந்த படம் நடிக்கும் பொது நடிகையால் ஏற்பட்ட விபரீதம் | Tamil Cinema News Kollywood | TAMIL STICK - Duration: 2:21. 


Không có nhận xét nào:
Đăng nhận xét