29 December 2016

How To Scrape Data From Website Using Jquery And Yahoo YQL




jquery ajax and yahoo YQL web scraping
Jquery is one of the most famous JavaScript libraries by which you can quickly add the animation effects to your web page. But it is not only limited to adding animation effect only. You can also build Android application (by using Jquery framework, Cordova) and Web applications.

Web scraping which is a part of web application plays a negative role in web development. Because it is a process of using bots to extract a particular content or the whole page from an external website. And many of the search engines and websites treat it as the malicious attack.

If you continuously scrap the same web page consistently, then it will amount to ban your IP address either temporary or permanently.

So in this case, we can use Yahoo YQL(Yahoo Query Language) as a proxy to make a cross-domain request by which we can easily scrap HTML web pages.

Note: Here is an example of web scraping Tumblr video downloader


I assume that you have only basic knowledge of JavaScript, Jquery (Ajax), and HTML.


JQuery cross-domain Screen Scraping


First we will need to target a web page to scrap HTML content. So, just for example purpose, I am using here 'example.com'. From here, I will scrap heading <h1> tag,

Jquery and yahoo YQL web scraping tutorial
1. Now, navigate to Yahoo YQL Console. and enter the following into the textbox area, and then select JSON, hit on 'Test'

select * from html where url="http://example.com/"

Note: The above query will scrap the whole page of example.com

jquery and yahoo yql web scraping

3. And, copy and paste the REST Query (at above image, rest query is the third highlighted area)  into the Notepad for further use.


Parse the Json using JavaScript

The most important part is now done. Yahoo YQL provide the whole scraped page in JSON format. So we need to parse the code using JavaScript and inject the same under a div tag.

4. Now, just open notepad, paste the below code into it and save as HTML.

<html>
<head>
<script src='https://ajax.googleapis.com/ajax/libs/jquery/1.11.2/jquery.min.js' type='text/javascript'></script>
</head>
<body>
<div id='parse-data'></div>

<script>
var yql   = "https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22http%3A%2F%2Fexample.com%2F%22&format=json&diagnostics=true&callback=";
                $.get(yql, function(data){
                   document.getElementById('parse-data').innerHTML = data.query.results.body.div.h1;
                });
</script>
</body>
</html>

Note:
1. The first highlighted code which you see at above is 'REST QUERY' url that we kept safe in step 3.
2. And the second highlighted code is used to extract heading tag from the JSON data. To know more about JSON, go to w3school click here

Now open the saved HTML file with your chrome browser and then you will able to see the heading tag of example.com. see below picture

free web scraping jquery ajax and yahoo yql
That's it.

Also read: Android, web scraping of eBay website using Tasker app






You may also like:

About Admin
Vishal is a Web Designer and a blogger.
You can find him on Facebook, Twitter, Gplus


Leave a Comment!