The Importance of Big Data in Economic Analysis (2)

Nathania Marshelia Sri Rahayu
4 min readDec 5, 2021

By Ardi Imawan, S.Kom., M.Sc, DOT Indonesia

Introduction to Web Scrapping Data

An API is a method of extracting data from a URL. Google Map Place API is an example. The Data can be received in a variety of ways, depending on the provider. For example, BPS data can be downloaded into Excel.

Some websites whose data is offered legally, such as social media services or Google, are another option that does not require scraping and does not require downloading. Google currently sells a variety of products, including services and data, in addition to ads.

We get a Google map for free, but we have to pay for the API.
API stands for application programming interface, and it allows you to get data from a website using a URL.

Some websites, such as the Google Translate API and Google Map, require payment; however, you do not need to know how to translate; simply call the API from Google Translate.

Apify: a website like a marketplace that offers web scraping data

We may search in the same way as we would on Google Maps, but by utilizing the API, which will send the URL to Google based on our keywords.

We need to understand the structure of the data we have. We can tell which ones need to be cleaned up or justified based on the data, such as empty rows or columns, wrong formatting for the review date, and so on.

The trick is to open the file (CSV), use the pandas library, which is commonly used by Scientists for research, set the delimiter to a semicolon or comma, and then type run. If there isn’t an error, then there isn’t an issue. The script’s goal is to eliminate unnecessary text. We don’t need to remember the script; all we need to do is comprehend it.

If many empty rows exist, they are removed by removing empty columns, which are junk or noise data. Check to determine which data is empty before deleting. Which column we want to see is determined by the subset.

If we simply fill in the review, it will be dropped if there is one line with an empty review. If the data is clear, or if it has been destroyed, the review findings must still be submitted in the form of a line format into the format that we require. As a result, we’re casting, which means we’re altering the format of the same column.

For example, we want to know the price movement of the iPhone from year to year. The easiest way is to collect data on all iPhones in the marketplace and then we process the data. Don’t forget to limit the search results so that the data remains specific (for example, by choosing sales with a 4-star rate because the store is trusted)

For visualization, we’ll need an output file. Google Data Studio is an example of a free and web-based tool. The idea is that we will choose the data source from the blank report, such as a google sheet file, a CSV file, and so on.

A table will emerge if we create a report. Others, such as time series, can be visualized, so we display the data with the date dimensions we wish to get.

Why does it take at least an hour to update MySQL database? Because if it’s in the database, a lot of data may be taken because Google will temporarily store it in Google Data Studio, using a lot of resources, but there’s a limit.

--

--

Nathania Marshelia Sri Rahayu
0 Followers

An economic college student who loves to write, design, and photography