How to Validate a URL Using Regular Expressions

Programming has made it easy to deal with structured and unstructured textual data. Tools like regular expressions and external libraries make these tasks a lot easier.


You can use most languages, including Python and JavaScript, to validate URLs using a regular expression. This example regex isn’t perfect, but you can use it to check URLs for simple use cases.


A Regular Expression to Validate a URL

The regex to validate a URL presented in this article is not perfect. There may be multiple examples of valid URLs that may fail this regex validation. This includes URLs involving IP addresses, non-ASCII characters, and protocols like FTP. The following regex only validates the most common URLs.

The regex will consider a URL valid if it satisfies the following conditions:

  1. The string should start with either http or https followed by ://.
  2. The combined length of the sub-domain and domain must be between 2 and 256. It should only contain alphanumeric characters and/or special characters.
  3. The TLD (Top-Level Domain) should only contain alphabetic characters and it should be between two and six characters long.
  4. The end of the URL string could contain alphanumeric characters and/or special characters. And it could repeat zero or more times.

You can validate a URL in JavaScript using the following regular expression:

^(http(s):\/\/.)[-a-zA-Z0-9@:%._\+~

Similarly, you can use the following regex to validate a URL in Python:

^((http|https):

Where:

  • (http|https)://) makes sure the string starts with either http or https followed by ://.
  • [-a-zA-Z0-9@:%._\\+~#?&//=] indicates alphanumeric characters and/or special characters. The first instance of this set represents the set of characters to allow in the sub-domain and domain part. While the second instance of this set represents the set of characters to allow in the query string or subdirectory part.
  • {2,256} represents 2 to 256 (both inclusive) times occurrence indicator. This indicates that the combined length of the subdomain and domain must be between two and 256.
  • \. represents the dot character.
  • [a-z]{2,6} means any lowercase letters from a to z with a length between two and six. This represents the set of characters to allow in the top-level domain part.
  • \b represents the boundary of a word, i.e. the start of a word or the end of one.
  • * is a repetition operator which indicates zero or more copies of the query string, parameters, or subdirectories.
  • ^ and $ indicate the start and end of the string respectively.

If you are uncomfortable with the above expression, check out a beginner’s guide to regular expressions first. Regular expressions take some time to get used to. Exploring some examples like validating user account details using regular expressions should help.

The above regex satisfies the following types of URLs:

  • https://www.something.com/
  • http://www.something.com/
  • https://www.something.edu.co.in
  • http://www.url-with-path.com/path
  • https://www.url-with-querystring.com/?url=has-querystring
  • http://url-without-www-subdomain.com/
  • https://mail.google.com

Using the Regular Expression in a Program

The code used in this project is available in a GitHub repository and is free for you to use under the MIT license.

This is a Python approach to validating a URL:

import re

def validateURL(url):
regex = "^((http|https):
r = re.compile(regex)

if (re.search(r, url)):
print("Valid")
else:
print("Not Valid")

url1 = "https://www.linkedin.com/"
validateURL(url1)
url2 = "http://apple"
validateURL(url2)
url3 = "iywegfuykegf"
validateURL(url3)
url4 = "https://w"
validateURL(url4)

This code uses Python’s re.compile() method to compile the regular expression pattern. This method accepts the regex pattern as a string parameter and returns a regex pattern object. This regex pattern object is further used to look for occurrences of the regex pattern inside the target string using the re.search() method.

If it finds at least one match, the re.search() method returns the first match. Note that if you want to search for all the matches to the pattern from the target string, you need to use the re.findall() method.

Running the above code will confirm that the first URL is valid but the rest of them are not.

Similarly, you can validate a URL in JavaScript using the following code:

function validateURL(url) {
if(/^(http(s):\/\/.)[-a-zA-Z0-9@:%._\+~
console.log('Valid');
} else {
console.log('Not Valid');
}
}

validateURL("https://www.linkedin.com/");
validateURL("http://apple");
validateURL("iywegfuykegf");
validateURL("https://w");

Again, running this code will confirm that the first URL is valid and the rest of them are invalid. It uses JavaScript’s match() method to match the target string against a regular expression pattern.

Validate Important Data Using Regular Expressions

You can use regular expressions to search, match, or parse text. They are also used for natural language processing, pattern matching, and lexical analysis.

You can use this powerful tool to validate important types of data like credit card numbers, user account details, IP addresses, and more.

We use cookies to give you the best experience.