422.1 Input validation and sanitisation

Protect software from corrupted data and malicious attacks by checking that all user input is correct, safe, and expected.

Overview

Input validation is one of the most essential strategies in secure coding. It ensures that all data received by a program is safe to use, whether it comes from a user form, a URL, an uploaded file, or another system. Without proper validation, even simple inputs can be used to exploit vulnerabilities and compromise an application.

Sanitisation is a related technique used to clean up unsafe input when it cannot be rejected entirely. Together, validation and sanitisation help prevent common threats such as code injection, cross-site scripting (XSS), and data corruption.

Targets

In this topic, students learn to:

  • Explain the importance of input validation and sanitisation in secure coding

  • Identify techniques that prevent unauthorised or dangerous input

  • Apply secure coding patterns to reduce the risk of injection and other common threats

Syllabus references

Secure software architecture

Impact of safe and secure software development

  • Apply and describe the benefits of collaboration to develop safe and secure software, including: – considering various points of view – delegating tasks based on expertise – quality of the solution

What is input validation?

Input validation is the process of checking that all data received by the program:

  • Is of the correct type (e.g. number, string, date)

  • Is in the correct format (e.g. email, URL, phone number)

  • Falls within an acceptable range (e.g. age between 13–120)

  • Meets expectations defined in the specification

Sources of input include:

  • User forms and UI fields

  • URL query strings and parameters

  • Cookies and session variables

  • API requests and uploaded files

  • Data retrieved from other systems or databases

Key principle: Never trust user input. Even if the interface looks safe (e.g. dropdown menus or hidden fields), attackers can tamper with it. Assume all input might be:

  • Incorrect (e.g. a letter instead of a number)

  • Malicious (e.g. SQL, JavaScript, or shell code)

  • Unexpected (e.g. blank, too long, or invalid format)

Validation vs sanitisation

  • Validation checks if the input is acceptable. If it fails, the input should be rejected.

  • Sanitisation modifies the input to remove or escape dangerous content. This is used when rejection isn’t practical.

Validate first. Only sanitise if you must accept the input.

Safe input validation strategies

1. Whitelist allowed values

Accept only input that matches an explicit, known-good set of rules.

if age in range(13, 100):
    register_user()

Questions

  1. Define input validation in the context of registering as a new user on a web app and creating a username and password.

Answer

Input validation is the process of checking that user input meets expected criteria before being processed or stored. When registering a new user, the web app should validate the username to ensure it contains only allowed characters (e.g. letters, numbers, underscores) and is the correct length. The password should also be validated to ensure it meets minimum complexity requirements, such as a minimum length and a mix of character types. This prevents malformed or malicious input from reaching the server or database.

  1. Explain how input validation can make html form tags more secure.

Answer

HTML form tags become more secure when input validation is applied because it prevents users from entering data that could exploit vulnerabilities in the system. For example, a <textarea> used for a comment field should reject scripts or unusual characters that might be used in a cross-site scripting (XSS) attack. Validation at both the client-side (e.g. with HTML5 attributes like type="email" or ) and server-side ensures that only expected data types are accepted, reducing the risk of injection, corruption or misuse.

  1. Outline THREE vulnerabilities that can arise from poor input validation.

Answer

Code injection – Malicious input can be used to inject code into a system, such as SQL injection or cross-site scripting (XSS), if not properly checked.

Data corruption – Invalid or unexpected input can break the application logic or corrupt stored data, especially in structured formats like databases or files.

Security bypass – Attackers might manipulate form inputs (e.g. changing hidden values) to gain access to restricted features or escalate privileges if validation is weak or missing.

  1. Give THREE reasons why user input should never be trusted.

Answer

Users can make mistakes – Input might be incomplete, wrongly formatted, or outside the expected range, even when not malicious.

Users can manipulate data – Attackers can bypass client-side controls using browser developer tools, interceptors, or custom scripts.

Attack vectors are often disguised as input – Hackers may use input fields to embed scripts, malformed data, or hidden payloads intended to exploit vulnerabilities.

  1. After a user securely signs into a website, they post comments on an article written by a blogger. Explain how sanitisation is used in this context to reduce vulnerabilities and improve security.

Answer

Sanitisation is the process of cleaning input data to remove or neutralise harmful content. In the context of blog comments, sanitisation strips or escapes any embedded scripts, HTML tags, or malicious code that a user might attempt to include in their comment. For example, the <script> tag could be removed or encoded to prevent the browser from executing it. This helps prevent cross-site scripting (XSS) attacks and ensures that only safe, displayable content is stored and shown to other users.

  1. Outline some security-by-design principles for validating a username and password during the signup process for a website.

Answer

Security-by-design principles for validating usernames and passwords include:

  • Whitelist validation – Accept only known safe patterns for usernames (e.g. alphanumeric and underscore, 3–20 characters).

  • Password strength rules – Require minimum length, and a mix of upper/lowercase letters, numbers and symbols.

  • Avoid feedback that reveals security logic – Use generic error messages like “Invalid username or password” rather than specifying which field failed.

  • Rate limiting and CAPTCHA – Protect the signup form from automated brute-force attacks.

  • Hashing passwords – Never store passwords in plain text; use secure hashing algorithms like bcrypt or Argon2.

  • Server-side validation – Always recheck input on the server, even if it's been validated in the browser.

Last updated

Was this helpful?