The Beginning Of The Web Pages And Google Hacking

, by Rafael Souza (CO-FOUNDER OF GREY HAT, NOW PARTNER OF THE GREEN HATS)

ABSTRACT

The evolution of technology has reached a point that necessitated the emergence of communication protocols, there was then the spread of the HTTP protocol and the HTML language initially, in the early 90s, the web pages have become a major means of communication between users, governments, institutions and professionals.

The "HyperText Transfer Protocol" is a protocol application responsible for handling requests and responses between "client and server" in the "World Wide Web", came up with the purpose of distributing information over the Internet, also to communicate between computers and specifications would be performed as transactions between clients and servers, through the use of rules.

INTRODUCTION

Readers will explain more details about the HTTP protocol, it is based on requests and responses between clients and servers. The client browser or device that will make the request, is also known as "user agent", asks a certain "resource", sending a packet of information containing some "headers" to a URI or URL "Uniform Resource Locator" (an e-feature). The server receives this information and sends a response, which can be a resource or another header.

HISTORY

The first version of HTTP was called "HTTP/0.9", a relatively simple protocol for transferring data in text format "ASCII" on the Internet, through a single request method, called "GET", nowadays it is one of the most used protocols. HTTP/1.0 version was developed between 1992 and 1996 to review some features like transferring not just text. The protocol also started to transfer messages like "Multipurpose Internet Mail Extension" and have added new request methods, known as POST and HEAD, and other features were implemented.

The current version of the protocol (HTTP1.1) was developed by a committee of the "Internet Engineering Task Force", which includes the main web creator Tim Berners-Lee. The main function of the protocol is to provide faster delivery of Web pages and reduce traffic, other additional utilities were also included, such as the use of persistent connections, better organization of the cache, new methods of requisitions, the use of proxy servers and more is also used for communication with other protocols, such as FTP, Gopher, SMTP, NNTP, providing access to resources from other applications, HTTP 1.1 also provides the ability to have multiple domain names that can share the same Internet address (IP), this function helps the processing for Web servers that host a large number of sites.

MÉTODOS HTTP

It is known that when you will make a request , you must specify which method will be used . HTTP methods , also known as verbs , identify what action should be performed on a given resource . There are 8 HTTP methods , but only 5 are most commonly used.

GET

Calls representation for a given resource . It is defined as a safe and should not be used to trigger an action (remove a user, for example).

POST

The information sent in the body (body ) of the request are used to create a new resource. It is also responsible for making processes that are not directly related to a resource.

DELETE

Removes a resource. Should return the 204 status if there is no recourse for the specified URI.

PUT

Updates a resource specified in the URI. If the resource does not exist, it can create a . The main diferenteça between POST and PUT is that the former can deal with not only resources, but can do information processing.

HEAD

Returns information about a resource. In practice , it works similar to the GET method , but without returning the resource in the request body . It is also considered a safe method.

The other methods are available OPTIONS, TRACE, and CONNECT. In theory , the servers must implement the GET and HEAD methods and , whenever possible, the OPTIONS method .

STATUS

Every request receives a response code known as status. With the status is impossible to know whether an operation was successful (200), if it has been moved and now exists elsewhere (301) or no longer exists (404) .

There are many statuses divided into several categories. In the specification you can see each of them with a very detailed description. Below, I show that some codes are more frequent.

200 OK

The request was successful.

301 Moved Permanently

The resource has been moved permanently to another URI.

302 Found

The feature has been temporarily moved to another URI.

304 Not Modified

The resource has not changed.

401 Unauthorized

The specified URI requires authentication of the client. The client can try to make new requests.

403 Forbidden

The server understood the request , but is refusing to heed. The client should not try to make a new request.

404 Not Found

The server found no corresponding URI.

405 Method Not Allowed

The method specified in the request is not valid in the URI. The response must include an Allow header containing a list of accepted methods.

410 Gone

The requested resource is unavailable but his current address is not known.

500 Internal Server Error

The server was not able to complete the request due to an unexpected error .

502 Bad Gateway

The server, while acting as a gateway or proxy , received an invalid response from the upstream server who made a request.

503 Service Unavailable

The server can not process the request because it is temporarily unavailable.

GOOGLE HACKING

Readers, I introduce a little about a very interesting technique that is Google Hacking, is a key to investigate if we are doing a pentest, or protecting our organization or individual item.

Google Hacking is the activity of using the site search capabilities, aiming to attack or better protect information of a company. The information available on the company's web servers are likely to be in the databases of Google.

Explaining... A misconfigured server may expose several business information on Google. It is difficult to get access to files from database sites through Google.

We can use as an example, the use of "cache" Google, where it stores older versions of all sites that were once indexed by their robots.

This feature allows you to have access to pages that have already been taken from the air, since they already exist in the database of Google.

Let's imagine that at some point in the history of an organization's site, a more sensitive information was available. After a while, the webmaster has been alerted that information removed from the site. However, if the page on the site has already been indexed by Google, it is possible that even having been altered or removed, can still access it using the Google cache feature.

A simple example of what we can find on Google, and you can come back to haunt the person who provided such information online is as follows: type in the search box cpf + curriculum. Certainly will return multiple results with links where we can find full name, address, phone, social security number, identity and some more information from people who publish their data on the internet. Having knowledge of how these data can be used in a malicious way, we can be more aware to publish any information on our internet.

COMMANDS TO USE GOOGLE

intitle, allintitle

Search content in title (tag title) of the page .

When using the intitle command, it is important to pay attention to the syntax of the search string, since the word that follows soon after the intitle command is regarded as the search string. The "allintitle" breaks this rule, telling Google that all the words that follow are to be found in the title of the page, so this last command is more restrictive.

inurl, allinurl

Find text in a URL.

As explained in the intitle operator may seem a relatively simple task using the inurl operator without giving more attention to it. But we must bear in mind that a URL is more complicated than a simple title, and operation of the inurl operator can also be complex.

Just as the intitle operator, inurl operator also has a companion who is allinurl , which works identically and restrictively, showing results only when all the strings were found.

Filetype

Search for a particular type of file.

Google search more than just web pages. You can search many different file types, including PDF (Adobe Portable Document Format) and Microsoft Office. The filetype operator can assist you in finding specific types of files . More specifically, we can use this operator to search for pages ending in a particular extension.

Allintext

Finds a string of text within a page.

The allintext operator is perhaps the simplest to use as it performs the function of most known search engines like : locate the term in the page text.

Although this operator can be used for broad opinion, is useful when you know that the search string can only be found in the page text. Using allintext operator can also serve as a shortcut to find the string anywhere, except in the title, URL and links.

Site

Directs the research to the content of a particular website.

Although technically a part of the URL, the address ( or domain name ) of a server can be better researched with the website operator. Site allows you to search only the pages that are hosted on a particular server or domain.

Link

Searching for links to a given page.

Instead of providing a search term, the operator needs a URL link or server name as an argument.

Inanchor

This operator can be regarded as a companion to the link operator, since both seek links. The inanchor operated , however , search the text representation of a link , not the current URL.

Inanchor accepts a word or expression as argument, as inanchor:click ou inanchor:oys. This type of search is useful especially when we began to study ways to look for correlations between sites.

Daterange

Search for pages published within a "range" of dates.

You can use this operator to find pages indexed by Google in a given date range. Every time Google crawls a page, the date in your database is changed. If Google find some dark Web site, you can happen to index it only once and never return to it.

If you find that your searches are clogged with these types of dark pages, you can remove them from your search (and more updated results) through the effective use of daterange operator.

Cache

Shows the version of a given page cache.

As discussed, Google keeps "snapshots" of pages indexed, and that we can access via the cached link on the search results page. If you want to go straight to the online version of a page cache , without first making a query to Google to get the cached link on the results page, you can simply use the cache operator in a query.

Info

Existing content shows the summary of information from Google.

The operator info shows the summary information of a site and provides links to other Google searches that may belong to this site. Informed of this operator parameter must be a valid URL.

CONCLUSION

Google has many features that can be used during a penetration test, and rightfully so is considered the best tool for hackers because it allows access to any type of information you want.

Google is the main tool for collecting information from our target. It is the best one to use the public system for information about anything regarding our target: sites, advertisements, partners, social networks, groups, etc.

The Beginning Of The Web Pages And Google Hacking

0 comments:

Post a Comment