Input Validation

Is a user harmless or dangerous? This is one of the basic questions of web application security. As virtually every programming or script language permits the execution of system commands, the risk is very high. Effective countermeasures are sophisticated input validations, which prevent this from happening.

Potential danger

A large part of the attacks on servers are carried out through the firewall and directly on the web applications offered on the Internet. Web applications will be used as the gateway to the back-end systems that are running in the background, whether it’s the shell, the file system, a data base (SQL, LDAP, ...), a mail server or even an SAP system.

The fact that this doesn’t concern any theoretical structures, but that there’s a serious danger also and especially in commercial application areas, can be shown by means of demonstrations of attacks on critical infrastructures such as SAP systems. If the worst comes to the worst, we aren’t talking about just a few harmless hackers, but about industrial espionage.

While classical hacker attacks approach the network infrastructure or the operating system of the server directly and, these days, are usually spotted and averted by firewalls, there’s more of a tendency toward attack methods that can encroach the firewall and access the server infrastructure of a company by official channels. Nowadays this happens via email and web applications.

Special features of a web application

So what’s the actual difference between a web application and a “classic applications”?

In a classical application, the user interface (UI) is firmly linked with the application. The data that can flow from the UI to the application is specified by the developer of the UI. Web applications have separated the UI from the actual application. No one can guarantee that the user – or an attacker – will send to the web application only the data that the developer has also intended in the UI.

This is why all the entries that the browser receives from the web server are initially considered to be a potential attack and must be checked meticulously. This concerns the parameters in the URL, the inputs in the forms by means of POST requests, but also such things like the host names of the host or the type of querying browser.

To start with we could mention the direct attacks on the web server. Then, there could be real program errors, e.g. buffer overflows, in the used web server software. In the past, e.g. when handling SSL certificates, there were the more frequent problems that an attacker could exploit in order to bypass the authentication or execute code on the system. On the other hand, many web servers in the standard configuration are much more open than they actually should be.

HTTP enables, for example, alongside the three customary types of request GET, HEAD and POST, also methods like CONNECT for proxy servers, TRACE for debugging, or PUT, COPY, MOVE, DELETE, LINK and UNLINK for web-based file systems. Each unnecessary but implemented and enabled method represents a potential security risk.

Countermeasures

What can you do about this? The web server should make available only the features that are actually required for the web application. If the configuration of the web server doesn’t enable this or if it’s difficult to monitor, vWAF can recognize and filter out invalid queries on the web server early on.

There was a further problem in the example CGI programs that were delivered for some time. It was standard to install these on the web server. This had partial security problems. In the meantime, web server producers have recognized this and it’s no longer standard to install such software.

The basic configuration of a web server isn’t optimal in most cases. Features like displaying the content of a directory if there’s no index.html file, are nice to develop but have no place in a productive system.

Buffer overflow problem

The classic attack on network applications is the buffer overflow. In doing so, simply more input data is sent to the server than it expects – in the hope that the developer of the web application won't check this properly and therefore other memory areas of the program will be overwritten.

Today, this problem occurs in the web area only rarely, as the modern programming languages (Java, Python, Ruby, Perl, PHP) no longer have a problem with buffer overflows. Nevertheless, there are still web applications that are written in C or C++. Generally speaking, these are also susceptible to such attacks, which was demonstrated, for example, in 2003 by the attacks on the SAP transaction server.

The main problem today is therefore no longer the classic buffer overflow, but the connection between the web application and the backend systems like data bases, file systems, email gateways and administration tools.

Forceful browsing

Forceful browsing is the attempt to reach actually non-accessible, i.e. non-linked parts of a website and therefore gain information that the website operator does not, or not yet, want to be made public.

In the simplest form, the attacker tries to guess the names of other files on the web server. Candidates for this are, among others, configuration files for web applications (which could contain in plain text the passwords of the used data bases), or even older versions of used programs, which could then issue the source text of the web application and therefore also important internal information, e.g. regarding the data base. For example, a URL / login.php is issued on many PHP-based systems. If this page is called, the web server uses the PHP ending to recognize that the script should be executed. If, pursuant to the installation of a new version, a backup is set up under the name login.php.old, it’s no longer detected by the web server as an executable script and the content of this file is supplied.

Only those files and scripts that really should be accessible from outside should be saved on a website or in the part of the directory tree that’s directly published by the web server. Utilities, configuration files and backups belong in a different directory that can’t be directly accessed by the web server.

Open doors

A different, less technical example from experience: Specific information, like for example, the report for the annual balance sheet of a company, should be published on the website at a specific time, but is generally generated a few days earlier. Just because the document isn’t yet linked by the website of the company, it doesn’t mean that it can’t be accessed.

How can an attacker gain access to this information? He can view, for example, the URLs of the respective documents from the previous years and try to guess the names for this year. If the company uses a content management system, the individual documents are usually approached via IDs. If a current document, for example, can be approached via the URL http://my.company.com/document.php/id=31337, then the attacker can easily test out the other values for id, in order to gain access to new documents.

In this area there have been several legally relevant cases in which an attacker has managed to gain information advantages on the stock exchange. What can you do about this? Ideally, the web server or the data base should contain only those documents that are supposed to be published. Alternatively, the content management system must enable a clear allocation of rights and scheduled enabling of documents.

Java applets and AJAX

At the moment, the trend in web development is tending toward the requirement of more interactivity on the browser and using the web server merely as a “latent” data base back-end. Therefore, application logic from fairly protected web servers will be transferred to the unprotected area of the browser. In doing so, this represents the serious danger that the security will suffer in the case of a simple 1:1 porting because authorization decisions will suddenly be made again in the browser and could therefore become controlled by the attacker.

Shell command injection

Shell command injection can occur if the input of the user is used as an argument for a query on the operating system. This could be reading a file or even sending an email. The problem occurs as both script languages like Perl and PHP as well as the command shell on Unix systems interpret certain character sequences specifically. The developer of web applications isn’t always aware of this.

If, for example, the function open() is used to read a file in Perl, the argument could be either a file name or, under certain circumstances, also a command.

open( ... , "file.txt") opens the “file.txt” file, while open( ... , "uname -a|") executes the command “uname -a”, and considers the issuing of this command to be the content of the “file”. If there’s now an argument in the web area of the file name that the web server receives from the browser, then an attacker could possibly execute commands on the system.

The same applies to the majority of other script languages and the Unix shell. The basic rule of developing is to check the plausibility of all the inputs that originate from outside, before they're processed. For file names, this means, for example, that they should consist solely of characters, figures and a point. If any other character appears in the file name, this is usually considered to be an attack and is treated as such.

The web application operator can, on the one hand, attempt to validate again the queries of the user on the web server and independently of the application developer. vWAF offers support here. In addition, the operator can and should try to minimize the potential arising damages in the event of a successful break-in, whereby he especially allows the web server to run with minimum rights on the system.

SQL injection

We’ve the same problem again here: Parts of the user inputs are used to execute a query to an external system, here an SQL data base. A typical example from the PHP environment:

A URL contains a username, the data of which the web application should display:

URL: http://my.company.com/showUser?name=petermiller

The code that processes this could look as follows:

if (isset($_GET[‘name’])) {$name = $_GET[‘name’]; $sql = "SELECT * FROM user_t WHERE name = ‘$name’"; $res =& $db->query($sql); ... }

So what can happen now? Provided the field “name” in the URL contains a truly normal username, everything will function as the developer would expect it to. The SQL query made on the data base has the form:

SELECT * from user_t WHERE name = ‘petermiller’

However, if an attacker enters instead of the name “petermiller” the character sequence

"petermiller’; UPDATE user_t SETrole = ‘admin’ WHERE name = ‘petermiller"

the query will look as follows:

SELECT * from user_t WHERE name = ‘petermiller’; UPDATE user_t SET role = ‘admin’WHERE name = ‘petermiller’

All of a sudden a completely different SQL command is executed and – assuming the appropriate rights of the application – modifies the data base.

What’s the precise problem? The user input changes the structure of the query, i.e. it represents code. It’s precisely this that must be prevented. There are essentially three possibilities.

Possibility 1:

First of all, the entries should be checked, as they’re with shell code injection. In our example we could only allow usernames that consist of characters and figures. Unfortunately, this isn’t possible for all fields in a data base. Irish names, like “O´Reilly” for example, contain an apostrophe as a special character. All characters should also be allowed for password entries. Therefore, checking the entry can solve only a part of the problem.

Possibility 2:

Secondly, all the data that’s passed to a back-end system (here an SQL data base), should be disguised according to the rules of the back-end system. This means that all the special characters interpreted by the data base must be handled and presented specifically. Unfortunately, the list of characters that are to be treated specially changes from data base to data base so you prefer to leave this to the developers of the data base or the programming language. When using a MySQL data base with PHP there’s, for example, the function mysql_real_escape_string. A better SQL command would then look as follows:

$quoted_name = mysql_real_escape_string($name); $sql = "SELECT * FROM user_t WHERE name = ‘$quotesd_name’";

But this just bypasses the problem again.

Possibility 3:

The correct solution is to leave the interpretation of the parameters and disguising of the special characters to the data base and clearly differentiate between SQL queries and arguments.

$sql = "SELECT * FROM user_t WHERE name = ?" $res =& $db->query($sql, array($name))

This prevents the attacker from changing the structure of the query.

Second order attacks

A final interesting method of attack is a so-called, “second order attack”. This type of attack doesn’t cause immediate damage. The actual attack is executed only when the data is evaluated at a later stage, e.g. once it’s shown again on the screen or pursuant to a daily log file analysis. This means there are no errors in the web application, but in a third web application. The effects of attacks are more difficult to assess, for example, by means of security audits, as the entire system architecture – and not just the actual web application – must be known. However, attacks on these levels are generally much more threatening for the infrastructure of a company as a whole.

You can also protect against these attacks by verifying the input parameters. As less important input data like the HTTP referrer or the HTTP agent type must be checked, this should be carried out by vWAF.