Table of contents

  1. Python - Regex Lookahead
  2. Delete digits in Python (Regex)
  3. Looping through python regex matches
  4. Python regex for MD5 hash

Python - Regex Lookahead

Lookaheads are a type of zero-width assertion in regular expressions. They allow you to match a pattern only if it is followed (or not followed) by another pattern, but they don't consume characters in the string. There are positive lookaheads and negative lookaheads.

1. Positive Lookahead (?=):

This matches a group if it is followed by another specific group.

Syntax: X(?=Y)

This means: Find X only if X is followed by Y.

Example:

import re

text = "apple pie, apple cake"
pattern = "apple(?= cake)"
matches = re.findall(pattern, text)
print(matches)  # ['apple']

In the above example, only the "apple" that is followed by " cake" is matched.

2. Negative Lookahead (?!):

This matches a group if it is not followed by another specific group.

Syntax: X(?!Y)

This means: Find X only if X is not followed by Y.

Example:

import re

text = "apple pie, apple cake"
pattern = "apple(?! cake)"
matches = re.findall(pattern, text)
print(matches)  # ['apple']

In this example, only the "apple" that isn't followed by " cake" is matched.

Practical Usage:

Suppose you want to find all instances of "error" in a log file, but you don't want to get "error" if it's followed by " - ignored". You could use a negative lookahead:

pattern = "error(?! - ignored)"

This will match any occurrence of "error" that isn't directly followed by " - ignored".

In summary, lookaheads allow for powerful and precise pattern matching, ensuring certain conditions are met after the main pattern, without actually matching the lookahead pattern itself.


Delete digits in Python (Regex)

If you want to delete digits (numeric characters) from a string in Python using regular expressions, you can use the re.sub() function from the re module to replace all digit characters with an empty string. Here's an example:

import re

# Input string containing digits
input_string = "Hello, 123 World! 4567"

# Use regular expression to delete digits
result_string = re.sub(r'\d', '', input_string)

# Print the result
print(result_string)

In this code:

  • We import the re module for regular expressions.
  • We define an input_string that contains digits.
  • We use the re.sub() function with the regular expression r'\d' to match all digit characters (\d) in the input string.
  • We replace all matched digits with an empty string '', effectively removing them.
  • The result_string now contains the original string with digits removed.

When you run this code, it will print:

Hello,  World! 

As you can see, all the digits have been removed from the original input string, leaving only the non-digit characters.


Looping through python regex matches

To loop through Python regex matches, you can use the re.finditer() function from the re module. This function returns an iterator that yields match objects for each match found in the input string. You can then loop through these match objects to access and process the matched text. Here's an example:

import re

# Sample text containing email addresses
text = "Email addresses: [email protected], [email protected], [email protected]"

# Define the regular expression pattern to match email addresses
pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,7}\b'

# Use re.finditer() to find all matches in the text
matches = re.finditer(pattern, text)

# Loop through the match objects and print the matched email addresses
for match in matches:
    print("Match found:", match.group())

In this example:

  1. We define a sample text containing email addresses.

  2. We define a regular expression pattern pattern to match email addresses. This pattern is a simplified version and may not cover all valid email address formats.

  3. We use re.finditer() to find all matches of the pattern in the text. This function returns an iterator.

  4. We loop through the match objects in the matches iterator using a for loop.

  5. Inside the loop, we access the matched text using match.group() and print it.

This code will print all the email addresses found in the input text. You can replace the pattern with your own regular expression to match the specific text patterns you need.


Python regex for MD5 hash

An MD5 hash is typically represented as a 32-character hexadecimal string. You can use regular expressions in Python to match MD5 hashes using the re module. Here's a simple regular expression pattern to match MD5 hashes:

import re

# Sample text containing MD5 hashes
text = "Sample text with MD5 hashes: 5d41402abc4b2a76b9719d911017c592 and d41d8cd98f00b204e9800998ecf8427e"

# Regular expression pattern for matching MD5 hashes
md5_pattern = r"\b[a-fA-F0-9]{32}\b"

# Find all MD5 hashes in the text
matches = re.findall(md5_pattern, text)

# Print the matched MD5 hashes
for match in matches:
    print("MD5 hash:", match)

In this code:

  • We define the md5_pattern regular expression pattern as r"\b[a-fA-F0-9]{32}\b". This pattern looks for 32-character hexadecimal strings surrounded by word boundaries (\b). [a-fA-F0-9] represents a hexadecimal character, and {32} specifies that there should be exactly 32 such characters.

  • We use re.findall() to find all occurrences of the MD5 hash pattern in the given text.

  • We iterate through the matched MD5 hashes and print them.

This code will correctly identify MD5 hashes within the provided text. Adjust the text variable to match your actual input text.


More Python Questions

More C# Questions