• Javascript
  • Python
  • Go
Tags: regex

Searching for UUIDs in text with regular expressions

UUIDs (Universally Unique Identifiers) are a type of identifier that is used to uniquely identify objects in computer systems. They are comm...

UUIDs (Universally Unique Identifiers) are a type of identifier that is used to uniquely identify objects in computer systems. They are commonly used in software development, database management, and other data-related tasks. As the name suggests, UUIDs are designed to be universally unique, meaning that there should never be two UUIDs that are identical. This makes them ideal for tasks such as data tracking, where it is important to have a unique identifier for each piece of information.

One common challenge when working with UUIDs is searching for them within a larger body of text. This can be especially difficult if the text is unstructured, such as a user input or a free-form document. Fortunately, regular expressions can provide a powerful solution for finding and extracting UUIDs from text.

Regular expressions, or regex for short, are a sequence of characters that define a search pattern. They are commonly used in programming and text processing tasks to perform advanced search and replace operations. The syntax of regular expressions can seem intimidating at first, but with a little practice, they can become a powerful tool for manipulating text.

To search for UUIDs in text using regular expressions, we first need to understand the structure of a UUID. A UUID is typically represented as a 32-digit hexadecimal number, divided into five groups separated by hyphens. For example, a typical UUID may look like this: 2bbfb5ad-a6c6-4d33-9f5d-1d4c5b03dcaf. The letters A-F in the hexadecimal number represent values 10-15, so a valid UUID can contain any combination of numbers and letters from 0-9 and A-F.

Now that we understand the structure of a UUID, we can construct a regular expression that will match it. The following regex pattern will match any valid UUID:

```regex

[a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}

```

Let's break down this pattern to understand how it works:

- `[a-fA-F0-9]` specifies that the characters within the brackets can be any lowercase or uppercase letter from A-F, and any digit from 0-9. This ensures that our regex will match both lowercase and uppercase UUIDs.

- `{8}` specifies that the preceding expression should be repeated exactly 8 times. This corresponds to the first group of 8 characters in a UUID.

- `-` is a literal hyphen, which we use to match the hyphens between each group of characters in a UUID.

- `{4}` specifies that the preceding expression should be repeated exactly 4 times. This corresponds to the next three groups of 4 characters in a UUID.

- `{12}` specifies that the preceding expression should be repeated exactly 12 times. This corresponds to the last group of 12 characters in a UUID.

Now that we have our regex pattern, we can use it to search for UUIDs in text. Most programming languages and text editors have a built-in function for searching with regular expressions. For example, in JavaScript, we can use the `match()` function to find all matches for our regex pattern in a given string:

```javascript

let text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut id 2bbfb5ad-a6c6-4d33-9f5d-1d4c5b03dcaf massa, vel vehicula tellus. Phasellus fringilla, felis et bibendum fringilla, justo lorem commodo 2bbfb5ad-a6c6-4d33-9f5d-1d4c5b03dcaf.";

let uuids = text.match(/[a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}/g);

console.log(uuids); // Output: ["2bbfb5ad-a6c6-4d33-9f5d-1d4c5b03dcaf", "2bbfb5ad-a6c6-4d33-9f5d-1d4c5b03dcaf"]

```

In this example, we have a string containing two UUIDs, and our regex pattern has successfully identified and extracted both of them.

Regular expressions also allow us to perform more advanced operations, such as replacing the matched text with a different value. For example, if we wanted to replace all UUIDs in a string with a placeholder, we could use the `replace()` function in JavaScript:

```

Related Articles

Regex: [A-Za-z][A-Za-z0-9]{4}

Regular expressions, commonly referred to as regex, are powerful tools used for pattern matching and manipulation in various programming lan...