Secure development with Drupal 7- part 1

Drupal is one of the most secure CMS because its support its developers with the appropriate API functions that keep drupal applications secure from sql injection and Cross site scripting attacks . In this tutorial i will discuss the most important API functions , bad and good development behaviors that affects your applications security.

Cross Site Scripting (XSS) Attacks

If you do not filter the text before display then you enable a user to insert dangerous HTML code into a page to inject client-side script into Web pages viewed by other users to bypass access controls such as the same origin policy.
That's why we should take care and filter any output text that will be displayed to users. Drupal provide a set of API functions that help developers to prevent XSS attacks in a different levels of development. We can categorize the data that should be filtered into four categories :

URL
Plain Text
Rich Text
HTML CODES

URL

1- check_url($uri)

We can Filter Drupal URLs by using check_url($uri) which strip dangerous protocols (e.g. 'javascript:') from a URI and encodes it for output to an HTML attribute value.

E.g

 
 '#title' => t('Check the messages and <a href="!url">try again</a>.', array('!url' => check_url(drupal_requirements_url($severity)))),

$form_action = check_url(drupal_current_script_url(array('op' => 'selection', 'token' => $token)));

2- valid_url($url)

valid_url($url) is used to check the format of the url and return true if the url format is correct, This function should only be used on actual URLs. It should not be used for Drupal menu paths

E.g

// Validate the URL, if one was entered.
  if (!empty($form_state['values']['remote']) && !valid_url($form_state['values']['remote'], TRUE)) {
    form_set_error('remote', t('This URL is not valid.'));
  }

Plain Text

Plain text is a simple text without any markup. What the user entered is displayed exactly on screen as is, and is not interpreted in any form. This is generally the format used for single-line text fields.
The following functions are used to filter the plain text and display it as is

1- check_plain($text)

check_plain() is the most important drupal functions that used to filter plain text and display it as is.When outputting plain-text, you need to pass it through check_plain() before it can be put inside HTML. This will convert quotes, ampersands and angle brackets into entities, causing the string to be shown literally on screen in the browser.
Secure Way:

/**
* $text :  plain text that inserted by the users 
*/
echo check_plain($text) ; // Safe Code ;)

drupal_set_title(check_plain($node->title));  // Correct

Insecure Way:

/**
* $text :  plain text that inserted by the users 
*/
echo $text ; // XSS !!!

drupal_set_title($node->title); // XSS vulnerability

2- t()

The t() function serves two purposes. First, at run-time it translates user-visible text into the appropriate language. Second, various mechanisms that figure out what text needs to be translated work off t().Although t() function is not originally created for security purposes but its support 2 types of placeholders (e.g. '%name' or '@name') that are passed as plain-text and will be escaped when inserted into the translatable string. You can disable this escaping by using placeholders of the form '!name' more

E.g

Secure Way:

$text = t("@name's blog", array('@name' => format_username($account))); // "@" placeholder is secure

$text = t("%name's blog", array('%name' => format_username($account))); // "%" placeholder is secure

Insecure Way:

$text = t("!name's blog", array('!name' => format_username($account))); // "!" placeholder is not secure

3- l($text, $path)

l() function is used to create an internal or external URL link as an HTML anchor tag. the resulting URL is passed through check_plain() before it is inserted into the HTML anchor tag, to ensure well-formed HTML. See url() for more information and notes.
that't why we should always use l() instead of creating anchor tag by using <a> tag.

E.g

Secure Way:

/**
* $url : link to another page that inserted dynamically by a user
*/
$link = l('more', $url) ; // secure because $url is passed to check_plain()

Insecure Way:

/**
* $url : link to another page that inserted dynamically by a user
*/
$link = "<a href='$url'>more</a>"; // not secure

4- format_string($string, array $args = array())

format_string() to apply the same placeholder as in t() with the same security features but without the translation, if your use case needs that functionality.

$not_translated = format_string('Email %user', array('%user' => $user->name));

Rich Text

Rich Text Format (RTF) is a standardized way to encode various text formatting properties, such as bold characters and typefaces, as well as document formatting and structures. Rich Text should converted to HTML on output using the various filters that are enabled. This is generally the format used for multi-line text fields.

`check_markup($text, $format_id)`

All you need to do is pass the rich text through check_markup() and you'll get HTML returned, safe for outputting. Because filters can inject JavaScript or execute PHP code, security is vital here. When a user supplies a text format, you should validate it using filter_access() before accepting/using it. This is normally done in the validation stage of the Form API. You should for example never make a preview of content in a disallowed format.
For anyone wondering what to put in Format_id. As mentioned somewhere else the format_id is actually a machine name of the said format defined in admin/configuration/text formats by default there are 3 formats defined

plain_text
filtered_html
full_html

E.g

if (isset($comment_body['format'])) {
   $comment_text = check_markup($comment_body['value'], $comment_body['format']);
 }

$comment->signature = check_markup($comment->signature, $comment->signature_format, '', TRUE);

//check_markup for textareas where you want to allow users some html options
$output .= '<div class="organization_description">'. check_markup($node->organization_desc, $node->filter) .'</div>'."\n";

HTML CODES

Another way to filter HTML code to prevent XSS attacks is to use filter_xss($string, $allowed_tags) which does this four things

Removes characters and constructs that can trick browsers.
Makes sure all HTML entities are well-formed.
Makes sure all HTML tags and attributes are well-formed.
Makes sure no HTML tags contain URLs with a disallowed protocol (e.g. javascript:).

E.g

$name = ($node->uid == 0) ? variable_get('anonymous', t('Anonymous')) : $node->name;
$replacements[$original] = $sanitize ? filter_xss($name) : $name;

function aggregator_filter_xss($value) {
  return filter_xss($value, preg_split('/\s+|<|>/', variable_get("aggregator_allowed_html_tags", '<a> <b> <br> <dd> <dl> <dt> <em> <i> <li> <ol> <p> <strong> <u> <ul>'), -1, PREG_SPLIT_NO_EMPTY));
}

see u @ part 2

Maged Eladawy

Inventor of the WEB...