Meta Search Engine | SoloLearn: Learn to code for FREE!

+56

Meta Search Engine

How can we make a meta search engine ? What resources are required ? Actually I wanted to know about trivago(about its actual working) which compares hotel deals from other websites and gives you the best one.

9/5/2018 2:38:29 PM

Rstar

35 Answers

New Answer

+54

Rstar you can use the http and API for making search engines see my wiki search engine✌️ https://code.sololearn.com/WScRbgzmC0cE/?ref=app

+21

Rstar I believe they use http requests. scaning websites using specialized scripts.

+19

Interesting question 💖🤗🙆‍♂️

+12

Actually they don't crawl the websites the hotels do business with Trivago and other sites like this. If you find a good deal on one of those sites call the hotel and they might give you a better deal. I do that all the time. Also crawlers can be blocked.

+10

Here is a basic explanation of how a search engine can be created for a specific purpose [Highly Simplified]: 1. Get a list of all sites (according to your interest), eg- a list of hotel sites. 2. Crawl the sites, index pages and record the content-length of each page. 3. Eliminate extra code from each page and extract only text and links. 4. Search through the text for what you want and index specific links according to it. (eg. deals, offers, etc) 5. Analyse and arrange them, basically sort them (maybe according to offers, hotel popularity, etc). 6. Cache them for future reference. 7. Remember you recorded the content length? Now every 6 hours (or any specific time interval) again crawl them. 8. If the content length differs, record whats changed, again cache them. ;) Thats it.

+8

As Spider38E said, they have agreements with hotels and they don't actually crawl web. But search engines like Google do that. They have code to read, analyze and classify the data on websites and web pages. So, when you enter a search term the engine returns website or web pages which contain data related to the term. But to make a good search engine you need AI to classify the sites properly. All search engines you know, have to separate fraud websites from actual ones. Google, being the most used, collects more user data and improves it's accuracy. More accuracy, more usage, more data, looped. The CURL package is famous for URL crawling. I'd suggest python with curl so you'll be able to implement AI as well as apply data science to analyze the data. Same can be done with R easily, but python would be more useful.

+7

Search engines were introduced to try and combat the problem of the rapidly growing Web. They provide us with an easy way to find what we want. Now, there are many different search engines. There are the general purpose engines that we are all used to such as Infoseek, AltaVista etc.. but there are also many more topic specific search engines that most of us are unaware of. Meta-search engines aim to help us overcome the problem of wading through all these various search engines by searching many of them simultaneously and displaying the results in a uniform format. Advantages The obvioous advantages of meta-search engines are: They are more efficient than searching the separate engines manually because they can do it in parallel. They only have one user interface and one syntax to remember The meta-search engine maintainers will hopefully make sure that any new resources are utilised by their service andd so we do not have to find these resources ourselves. How do they work? The meta-search engines do not actually have access to the individual databases, but query the other search engines as we would. The following diagram illustrates the flow of how this is acheived: A meta-search engine can be reduced into three sections. Dispatch mechanism - This determines which search engines the users query should be sent to. If the user was searching for information on architecture then a search engine dedicated to music recordings does not need to be queried. Interface agents - This converts the users query into different formats conforming to the various engines syntax and then queries the selected engines. Display mechanism - This has the task of manipulating the results into a uniform format for displaying to the user. This can include ranking the results and deleting duplicates.

+6

You need to get special access to their API's .

+5

they use APIs and by using them they make HTTP requests to the provider's server and the server will send the response and they do it by using lots of hotel APIs and fetch data simultaneously and then compare the responses from all APIs then show to the user

+5

Use php function get_meta_tags function to get all meta information of a website. sample here: https://code.sololearn.com/wbLpZOg0y7DC/?ref=app You could make an array of all hotel urls, iterate each url to get all its meta informations. Please note that Code Playground unable to crawl "https" links due to Sololearn has disabled https wrapper from its server, use a local or web server instead.

+5

they probably have a travel agency certificate/membership with something like IATA, association for travel agents. Being part of a club like that will facilitate things, to get commissions on every hotel/flight bookings. i would look into it before coding web scraping.

+4

i agree with Rstar with that you need to use http request the scanning the code ehile cutting out most of the code which will leave you with a bunch of text, then just do a search through the text looking for the info you want.

+4

hard to tell

+4

here is trivago breakdown per the query https://codereview.stackexchange.com/questions/77324/trivago-hotels-price-checker

+3

You could also easily crawl all meta tag data from any website url using Node.js with metatag-crawler package. Run node with the following code. var scrape = require("metatag-crawler"); scrape( "https://www.sololearn.com", function(err, data) { console.log(err); // null console.log("Title: " + data.meta.title); console.log("Description: " + data.meta.description); console.log("Canonical: " + data.meta.canonical); } );

+3

Use php function get_meta_tags function to get all meta information of a website. sample here: https://code.sololearn.com/wbLpZOg0y7DC/?ref=app You could make an array of all hotel urls, iterate each url to get all its meta informations. Please note that Code Playground unable to crawl "https" links due to Sololearn has disabled https wrapper from its server, use a local or web server instea

+3

hey everyone I updated my Wikipedia search engine you can check it here and tell if any improvement you want https://code.sololearn.com/WScRbgzmC0cE/?ref=app

+1

i agree with Rstar.👍

+1

api and the http request is the solution...

+1

Can any tell which programe should I learn order list