Table of contents

How do I use regular expressions for data extraction in Swift?

Regular expressions (regex) are powerful tools for pattern matching and data extraction in Swift applications. Swift provides several ways to work with regular expressions, primarily through the NSRegularExpression class and built-in String methods. This guide covers everything you need to know about using regex for data extraction in Swift.

Basic Regex Setup in Swift

Swift uses the NSRegularExpression class from Foundation for advanced regex operations. Here's how to create and use a basic regex pattern:

import Foundation

func createRegex(pattern: String) -> NSRegularExpression? {
    do {
        return try NSRegularExpression(pattern: pattern, options: [])
    } catch {
        print("Invalid regex pattern: \(error)")
        return nil
    }
}

// Example: Extract email addresses
let emailPattern = #"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}"#
let emailRegex = createRegex(pattern: emailPattern)

Extracting Data with NSRegularExpression

Finding All Matches

The most common use case is finding all matches of a pattern within a text:

func extractEmails(from text: String) -> [String] {
    let emailPattern = #"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}"#
    guard let regex = createRegex(pattern: emailPattern) else { return [] }

    let range = NSRange(text.startIndex..., in: text)
    let matches = regex.matches(in: text, options: [], range: range)

    return matches.compactMap { match in
        guard let stringRange = Range(match.range, in: text) else { return nil }
        return String(text[stringRange])
    }
}

// Usage
let text = "Contact us at support@example.com or sales@company.org"
let emails = extractEmails(from: text)
print(emails) // ["support@example.com", "sales@company.org"]

Extracting Specific Groups

Capture groups allow you to extract specific parts of a match:

func extractPhoneNumbers(from text: String) -> [(fullNumber: String, areaCode: String, number: String)] {
    let phonePattern = #"\((\d{3})\)\s*(\d{3}-\d{4})"#
    guard let regex = createRegex(pattern: phonePattern) else { return [] }

    let range = NSRange(text.startIndex..., in: text)
    let matches = regex.matches(in: text, options: [], range: range)

    return matches.compactMap { match in
        guard match.numberOfRanges == 3,
              let fullRange = Range(match.range, in: text),
              let areaCodeRange = Range(match.range(at: 1), in: text),
              let numberRange = Range(match.range(at: 2), in: text) else {
            return nil
        }

        return (
            fullNumber: String(text[fullRange]),
            areaCode: String(text[areaCodeRange]),
            number: String(text[numberRange])
        )
    }
}

// Usage
let phoneText = "Call us at (555) 123-4567 or (888) 999-0000"
let phones = extractPhoneNumbers(from: phoneText)
for phone in phones {
    print("Full: \(phone.fullNumber), Area: \(phone.areaCode), Number: \(phone.number)")
}

Using String Methods for Simple Patterns

For simpler patterns, Swift's String type provides convenient methods:

Range-based Extraction

extension String {
    func extractURLs() -> [String] {
        let urlPattern = #"https?://[^\s]+"#
        guard let regex = try? NSRegularExpression(pattern: urlPattern) else { return [] }

        let range = NSRange(self.startIndex..., in: self)
        let matches = regex.matches(in: self, options: [], range: range)

        return matches.compactMap { match in
            guard let stringRange = Range(match.range, in: self) else { return nil }
            return String(self[stringRange])
        }
    }

    func extractHashtags() -> [String] {
        let hashtagPattern = #"#\w+"#
        guard let regex = try? NSRegularExpression(pattern: hashtagPattern) else { return [] }

        let range = NSRange(self.startIndex..., in: self)
        let matches = regex.matches(in: self, options: [], range: range)

        return matches.compactMap { match in
            guard let stringRange = Range(match.range, in: self) else { return nil }
            return String(self[stringRange])
        }
    }
}

// Usage
let socialText = "Check out https://example.com for #swift tips! #programming"
let urls = socialText.extractURLs()
let hashtags = socialText.extractHashtags()
print("URLs: \(urls)")       // ["https://example.com"]
print("Hashtags: \(hashtags)") // ["#swift", "#programming"]

Advanced Data Extraction Patterns

Extracting Structured Data

For complex data structures like HTML-like tags or custom formats:

struct ProductInfo {
    let name: String
    let price: String
    let sku: String
}

func extractProductInfo(from html: String) -> [ProductInfo] {
    let productPattern = #"<product name="([^"]+)" price="([^"]+)" sku="([^"]+)">"#
    guard let regex = createRegex(pattern: productPattern) else { return [] }

    let range = NSRange(html.startIndex..., in: html)
    let matches = regex.matches(in: html, options: [], range: range)

    return matches.compactMap { match in
        guard match.numberOfRanges == 4,
              let nameRange = Range(match.range(at: 1), in: html),
              let priceRange = Range(match.range(at: 2), in: html),
              let skuRange = Range(match.range(at: 3), in: html) else {
            return nil
        }

        return ProductInfo(
            name: String(html[nameRange]),
            price: String(html[priceRange]),
            sku: String(html[skuRange])
        )
    }
}

Date and Time Extraction

func extractDates(from text: String) -> [Date] {
    let dateFormatter = DateFormatter()
    dateFormatter.dateFormat = "yyyy-MM-dd"

    let datePattern = #"\d{4}-\d{2}-\d{2}"#
    guard let regex = createRegex(pattern: datePattern) else { return [] }

    let range = NSRange(text.startIndex..., in: text)
    let matches = regex.matches(in: text, options: [], range: range)

    return matches.compactMap { match in
        guard let stringRange = Range(match.range, in: text) else { return nil }
        let dateString = String(text[stringRange])
        return dateFormatter.date(from: dateString)
    }
}

// Usage
let logText = "Error occurred on 2023-12-15 and again on 2023-12-16"
let dates = extractDates(from: logText)
print("Found dates: \(dates)")

Web Scraping Applications

When building web scraping applications in Swift, regex is particularly useful for extracting specific data patterns from HTML content. While tools like handling JavaScript-rendered content with Swift require more sophisticated approaches, regex works well for static content extraction.

Extracting Meta Information

func extractMetaTags(from html: String) -> [String: String] {
    var metaTags: [String: String] = [:]

    let metaPattern = #"<meta\s+name="([^"]+)"\s+content="([^"]+)""#
    guard let regex = createRegex(pattern: metaPattern) else { return metaTags }

    let range = NSRange(html.startIndex..., in: html)
    let matches = regex.matches(in: html, options: [], range: range)

    for match in matches {
        guard match.numberOfRanges == 3,
              let nameRange = Range(match.range(at: 1), in: html),
              let contentRange = Range(match.range(at: 2), in: html) else {
            continue
        }

        let name = String(html[nameRange])
        let content = String(html[contentRange])
        metaTags[name] = content
    }

    return metaTags
}

Performance Considerations

Compiling Regex Patterns

For frequently used patterns, compile them once and reuse:

class DataExtractor {
    private let emailRegex: NSRegularExpression
    private let phoneRegex: NSRegularExpression
    private let urlRegex: NSRegularExpression

    init() throws {
        emailRegex = try NSRegularExpression(pattern: #"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}"#)
        phoneRegex = try NSRegularExpression(pattern: #"\((\d{3})\)\s*(\d{3}-\d{4})"#)
        urlRegex = try NSRegularExpression(pattern: #"https?://[^\s]+"#)
    }

    func extractAllData(from text: String) -> (emails: [String], phones: [String], urls: [String]) {
        let range = NSRange(text.startIndex..., in: text)

        let emails = emailRegex.matches(in: text, options: [], range: range).compactMap { match in
            guard let stringRange = Range(match.range, in: text) else { return nil }
            return String(text[stringRange])
        }

        let phones = phoneRegex.matches(in: text, options: [], range: range).compactMap { match in
            guard let stringRange = Range(match.range, in: text) else { return nil }
            return String(text[stringRange])
        }

        let urls = urlRegex.matches(in: text, options: [], range: range).compactMap { match in
            guard let stringRange = Range(match.range, in: text) else { return nil }
            return String(text[stringRange])
        }

        return (emails, phones, urls)
    }
}

Error Handling and Best Practices

Robust Pattern Matching

enum RegexError: Error {
    case invalidPattern
    case noMatches
}

func safeExtract<T>(from text: String, pattern: String, transform: (String) -> T?) -> Result<[T], RegexError> {
    guard let regex = try? NSRegularExpression(pattern: pattern) else {
        return .failure(.invalidPattern)
    }

    let range = NSRange(text.startIndex..., in: text)
    let matches = regex.matches(in: text, options: [], range: range)

    guard !matches.isEmpty else {
        return .failure(.noMatches)
    }

    let results = matches.compactMap { match -> T? in
        guard let stringRange = Range(match.range, in: text) else { return nil }
        let matchedString = String(text[stringRange])
        return transform(matchedString)
    }

    return .success(results)
}

// Usage
let result = safeExtract(from: "Price: $29.99, $15.50", pattern: #"\$\d+\.\d{2}"#) { priceString in
    Double(priceString.dropFirst()) // Remove $ sign
}

switch result {
case .success(let prices):
    print("Found prices: \(prices)")
case .failure(let error):
    print("Error: \(error)")
}

Combining with Other Swift Features

Regular expressions work well with Swift's modern features like async/await for concurrent processing:

actor DataProcessor {
    func processText(_ text: String) async -> ExtractedData {
        // Process large text files concurrently
        async let emails = extractEmails(from: text)
        async let phones = extractPhoneNumbers(from: text)
        async let urls = text.extractURLs()

        return await ExtractedData(
            emails: emails,
            phones: phones.map(\.fullNumber),
            urls: urls
        )
    }
}

struct ExtractedData {
    let emails: [String]
    let phones: [String]
    let urls: [String]
}

Regular expressions in Swift provide a powerful foundation for data extraction tasks. When combined with effective HTTP request handling and proper error management, they enable robust data processing solutions for web scraping and text analysis applications.

Conclusion

Swift's regex capabilities through NSRegularExpression offer comprehensive pattern matching and data extraction features. By following the patterns and best practices outlined in this guide, you can efficiently extract structured data from text, HTML, and other string-based formats in your Swift applications. Remember to compile frequently used patterns once, handle errors gracefully, and consider performance implications when processing large amounts of text data.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon