vetkastar / python

Run any python code

  • Public
  • 2K runs

Run time and cost

This model costs approximately $0.0013 to run on Replicate, or 769 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on CPU hardware. Predictions typically complete within 14 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Python Code Execution Node Documentation

Overview

This node provides a comprehensive environment for executing Python code with support for data processing, image manipulation, document creation, multimedia operations, machine learning tasks, and more.

Input Parameters

  • code (str): Python code to execute
  • input_file (Path, optional): Path to input file python # Example structure of input parameters { "code": "your_python_code_here", "input_file": "path/to/your/file.ext" # optional }

Available Libraries and Their Usage

Data Processing & Analysis

NumPy (2.1.2)

Numerical computing library for array operations and mathematical functions.

import numpy as np

# Arrays and operations
array = np.array([1, 2, 3, 4, 5])
mean = np.mean(array)
std = np.std(array)
result = np.square(array)

# Matrix operations
matrix = np.array([[1, 2], [3, 4]])
inverse = np.linalg.inv(matrix)
eigenvalues = np.linalg.eigvals(matrix)

Pandas (2.0.3)

Data manipulation and analysis library.

import pandas as pd

# Data frame operations
df = pd.read_csv(input_file)
filtered = df[df['column'] > 5]
grouped = df.groupby('category').mean()
pivoted = pd.pivot_table(df, values='value', index='row', columns='col')

# Time series
dates = pd.date_range('20230101', periods=10)
ts = pd.Series(np.random.randn(10), index=dates)

SciPy (1.14.1)

Scientific computing library.

from scipy import stats, optimize, interpolate
import numpy as np

# Statistics
normal_dist = stats.norm(0, 1)
probabilities = normal_dist.pdf([-1, 0, 1])

# Optimization
def f(x): return (x[0] - 1)**2 + (x[1] - 2)**2
result = optimize.minimize(f, [0, 0])

# Interpolation
x = np.linspace(0, 4, 5)
y = np.exp(-x/3.0)
f = interpolate.interp1d(x, y)

Image Processing

Pillow (11.0.0)

Python Imaging Library.

from PIL import Image, ImageEnhance, ImageFilter

# Basic operations
image = Image.open(input_file)

# Resize
resized = image.resize((800, 600))
resized.save('resized.jpg')

# Rotate
rotated = image.rotate(45)
rotated.save('rotated.jpg')

# Convert to grayscale
grayscale = image.convert('L')
grayscale.save('grayscale.jpg')

# Filters and effects
blurred = image.filter(ImageFilter.BLUR)
blurred.save('blurred.jpg')

# Enhance contrast
enhanced = ImageEnhance.Contrast(image).enhance(1.5)
enhanced.save('enhanced.jpg')

# You can add combined effects 
# For example, rotated and blurred images
rotated_and_blurred = rotated.filter(ImageFilter.BLUR)
rotated_and_blurred.save('rotated_and_blurred.jpg')

# Or increase the contrast of the gray image
enhanced_gray = ImageEnhance.Contrast(grayscale).enhance(1.8)
enhanced_gray.save('enhanced_grayscale.jpg')

OpenCV (4.10.0.84)

Computer vision library.

import cv2

# Load image
img = cv2.imread(str(input_file))

# Check if image is loaded successfully
if img is not None:
    # Convert to grayscale
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    cv2.imwrite('gray.jpg', gray)

    # Edge detection
    edges = cv2.Canny(gray, 100, 200)
    cv2.imwrite('edges.jpg', edges)

    # Blur image
    blurred = cv2.GaussianBlur(img, (5, 5), 0)
    cv2.imwrite('blurred.jpg', blurred)

    # Resize image
    resized = cv2.resize(img, (800, 600))
    cv2.imwrite('resized.jpg', resized)

scikit-image (0.21.0)

Advanced image processing library.

from skimage import io, filters, feature, segmentation, color

# Load image (assumes color/RGB input)
image = io.imread(str(input_file))

# Convert RGB to grayscale for certain operations
gray_image = color.rgb2gray(image)

# Basic filters
# Gaussian filter
gaussian = filters.gaussian(gray_image, sigma=2)
io.imsave('gaussian.jpg', (gaussian * 255).astype('uint8'))

# Sobel edge detection
sobel = filters.sobel(gray_image)
io.imsave('sobel.jpg', (sobel * 255).astype('uint8'))

# Canny edge detection (works with 2D arrays)
edges = feature.canny(
    gray_image,
    sigma=3,
    low_threshold=0.1,
    high_threshold=0.2
)
io.imsave('canny.jpg', (edges * 255).astype('uint8'))

# SLIC segmentation
segments = segmentation.slic(
    image,
    n_segments=100,
    compactness=10,
    channel_axis=-1  # for RGB images
)
io.imsave('segments.jpg', (segments * 255 / segments.max()).astype('uint8'))

Document Processing

ReportLab (4.0.4) with Custom Fonts

PDF creation with support for custom fonts.

import requests
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import A4
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont
import os

# Download and register custom font
font_url = "https://fontsforyou.com/downloads/1-jmhwulfilanew"
font_path = os.path.join('CustomFont.ttf')

response = requests.get(font_url)
with open(font_path, 'wb') as f:
    f.write(response.content)

# Register font
pdfmetrics.registerFont(TTFont('CustomFont', font_path))

# Create PDF with custom font
pdf = canvas.Canvas(os.path.join('custom_text.pdf'), pagesize=A4)
width, height = A4

# Use custom font
pdf.setFont("CustomFont", 24)
title = "Custom Font Example"
pdf.drawString(100, height - 100, title)

pdf.save()
os.remove(font_path)  # Clean up font file

python-docx (0.8.11)

Microsoft Word document creation.

from docx import Document
from docx.shared import Inches

# Create document
doc = Document()

# Add main heading
doc.add_heading('Document Title', 0)

# Add normal paragraph
doc.add_paragraph('Normal text paragraph')

# Add section heading
doc.add_heading('Section', level=1)

# Create and fill table
table = doc.add_table(rows=3, cols=3)
for row in table.rows:
    for cell in row.cells:
        cell.text = 'Cell content'

# Add picture - convert Path to string
if input_file:
    doc.add_picture(str(input_file), width=Inches(5.0))

# Save document
doc.save('document.docx')

PyPDF2 (3.0.1)

PDF manipulation library.

from PyPDF2 import PdfReader, PdfWriter

# Load PDF
reader = PdfReader(str(input_file))

# Get document info
print(f"Number of pages: {len(reader.pages)}")
print(f"PDF Metadata: {reader.metadata}")

# Create new PDF
writer = PdfWriter()

# Process first page
if len(reader.pages) > 0:
    # Get first page
    page = reader.pages[0]

    # Get page properties
    print(f"Page size: {page.mediabox}")

    # Extract text from page
    text = page.extract_text()
    print(f"First page text: {text[:100]}...")  # First 100 characters

    # Rotate page (optional)
    page.rotate(90)

    # Add page to new document
    writer.add_page(page)

    # Add some metadata to new PDF
    writer.add_metadata({
        "/Producer": "PyPDF2",
        "/Title": "Modified PDF",
        "/Author": "Python Script"
    })

    # Save result
    with open("modified.pdf", "wb") as output_file:
        writer.write(output_file)

Markdown (3.4.4)

Markdown processing library.

import markdown

# Convert markdown to HTML
md_text = "# Title\n## Subtitle\n* List item 1\n* List item 2"
html = markdown.markdown(md_text)

# Save HTML
with open('converted.html', 'w') as f:
    f.write(html)

Vector Graphics

pycairo (1.24.0)

2D graphics library.

import cairo

# Create surface
surface = cairo.ImageSurface(cairo.FORMAT_RGB24, 600, 400)
ctx = cairo.Context(surface)

# Draw shapes
ctx.set_source_rgb(0.8, 0.8, 0.8)
ctx.rectangle(100, 100, 400, 200)
ctx.fill()

ctx.set_source_rgb(1, 0, 0)
ctx.arc(300, 200, 50, 0, 2 * 3.14159)
ctx.fill()

# Save
surface.write_to_png('drawing.png')

svgwrite (1.4.3)

SVG creation library.

import svgwrite

# Create SVG
dwg = svgwrite.Drawing('drawing.svg', size=('800px', '600px'))
dwg.add(dwg.rect((10, 10), (100, 50), fill='red'))
dwg.add(dwg.circle(center=(200, 200), r=30, fill='blue'))
dwg.save()

Video and Audio Processing

MoviePy (1.0.3)

Video editing library.

from moviepy.editor import VideoFileClip

# Load video
video = VideoFileClip(str(input_file))

# Basic operations with video
# Cut first 5 seconds
cut = video.subclip(0, 5)

# Resize video
resized = cut.resize(width=480)  # resize to width=480px keeping aspect ratio

# Change speed
fast = cut.speedx(2)  # 2x speed

# Save all versions
cut.write_videofile('cut.mp4')
resized.write_videofile('resized.mp4')
fast.write_videofile('fast.mp4')

# Close video to free up resources
video.close()

ffmpeg-python (0.2.0)

FFmpeg wrapper for Python.

import ffmpeg

try:
    # Input stream with specified options
    stream = ffmpeg.input(str(input_file))

    # Add multiple processing steps
    # 1. Scale video
    stream = ffmpeg.filter(stream, 'scale', 720, -1)  # 720p, keep aspect ratio

    # 2. Set bitrate and codec
    stream = ffmpeg.output(
        stream,
        'processed.mp4',
        video_bitrate='2000k',
        codec='libx264',
        preset='medium',
        acodec='aac',
        audio_bitrate='128k',
        loglevel='error'  # Reduce log output
    )

    # Run with overwrite output
    ffmpeg.run(stream, overwrite_output=True)

except ffmpeg.Error as e:
    print('FFmpeg error:', e.stderr.decode())
except Exception as e:
    print('Error:', str(e))

Web and Network Tools

Requests (2.32.3)

HTTP library.

import requests

# GET request
response = requests.get('https://api.example.com/data')
data = response.json()

# POST request with data
payload = {'key': 'value'}
response = requests.post('https://api.example.com/post', json=payload)

# Download file
response = requests.get('https://example.com/file.pdf')
with open('downloaded.pdf', 'wb') as f:
    f.write(response.content)

BeautifulSoup4 (4.12.2)

Web scraping library.

from bs4 import BeautifulSoup
import requests

# Parse HTML
response = requests.get('https://example.com')
soup = BeautifulSoup(response.text, 'html.parser')

# Find elements
titles = soup.find_all('h1')
links = soup.find_all('a')

# Extract data
data = {
    'titles': [title.text for title in titles],
    'links': [link['href'] for link in links]
}

Utility Libraries

tqdm (4.66.5)

Progress bar library.

from tqdm import tqdm
import time

# Simple example of progress bar usage
data = list(range(10))
for item in tqdm(data):
    time.sleep(0.5)  # Simulate processing

python-magic (0.4.27)

File type detection.

import magic

# Get detailed file information using python-magic
file_type = magic.from_file(str(input_file))      # Get detailed file type
mime_type = magic.from_file(str(input_file), mime=True)  # Get MIME type

# Print and return results
print(f"File type: {file_type}")    # Shows detailed technical information
print(f"MIME type: {mime_type}")    # Shows standardized MIME type

result = {
    'file_type': file_type,
    'mime_type': mime_type
}

Additional Libraries

decorator (4.4.2)

Function decoration tools.

from decorator import decorator

@decorator
def debug(f, *args, **kwargs):
    """Simple debugging decorator"""
    print(f"Calling: {f.__name__}")
    print(f"Args: {args}")
    print(f"Kwargs: {kwargs}")
    result = f(*args, **kwargs)
    print(f"Result: {result}")
    return result

@debug
def add(a, b):
    """Add two numbers"""
    return a + b

@debug
def greet(name):
    """Greet someone"""
    return f"Hello, {name}!"

# Test our decorated functions
print("Starting tests")
print("-" * 20)

add(5, 3)
print("-" * 20)

greet("World")
print("-" * 20)

py360convert (0.1.0)

360-degree video/image conversion.

from py360convert import e2c
import cv2
import numpy as np

# Загружаем изображение
img = cv2.imread(str(input_file))
if img is None:
    raise ValueError("Failed to load image")

# Convert equirectangular to cubemap
cubemap = e2c(img, face_w=256, mode='bilinear')

# Сохраняем результат
cv2.imwrite('cubemap.jpg', cubemap)

certifi (2024.8.30)

SSL Certificates provider.

import certifi
import requests

# Безопасный запрос с указанием актуальных сертификатов
response = requests.get('https://api.example.com', verify=certifi.where())
print(response.status_code)

charset-normalizer (3.4.0)

Character encoding detection.

from charset_normalizer import detect

# Determine byte data encoding
with open('input_file.txt', 'rb') as file:
    raw_data = file.read()
    detected = detect(raw_data)
    encoding = detected['encoding']

print(f"Detected encoding: {encoding}")

idna (3.10)

International Domain Names handling.

import idna

# Encode domain name
encoded = idna.encode('domen.com')
# Decode domain name
decoded = idna.decode('xn--d1acufc.xn--p1ai')

Advanced Examples

1. Complete PDF Generation with Images and Custom Font

import requests
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import A4
from reportlab.lib.units import inch
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont
import os
import zipfile
from PIL import Image
import io

# Download custom font
font_url = "https://fontsforyou.com/downloads/1-jmhwulfilanew"
font_path = os.path.join('JMHWulfilaNew.ttf')

response = requests.get(font_url)
with open(font_path, 'wb') as f:
    f.write(response.content)

# Register font
pdfmetrics.registerFont(TTFont('JMHWulfilaNew', font_path))

if input_file and str(input_file).endswith('.zip'):
    # Create PDF
    pdf = canvas.Canvas(os.path.join('result.pdf'), pagesize=A4)
    width, height = A4

    with zipfile.ZipFile(input_file, 'r') as zip_ref:
        image_files = [f for f in zip_ref.namelist() 
                      if f.lower().endswith(('.png', '.jpg', '.jpeg'))][:4]

        if len(image_files) < 4:
            raise ValueError("Need exactly 4 images in ZIP file")

        # Calculate layout
        margin = inch
        img_width = (width - 3*margin) / 2
        img_height = (height - 4*margin) / 2

        # Define image positions
        positions = [
            (margin, height - margin - img_height),
            (2*margin + img_width, height - margin - img_height),
            (margin, height - 2*margin - 2*img_height),
            (2*margin + img_width, height - 2*margin - 2*img_height)
        ]

        # Process images
        for idx, (img_name, pos) in enumerate(zip(image_files, positions)):
            temp_image_path = os.path.join(f'temp_{idx}.jpg')

            with zip_ref.open(img_name) as img_file:
                img_data = io.BytesIO(img_file.read())
                img = Image.open(img_data)

                if img.mode != 'RGB':
                    img = img.convert('RGB')

                img.save(temp_image_path, format='JPEG')

                x, y = pos
                pdf.drawImage(temp_image_path, x, y, width=img_width, 
                            height=img_height, preserveAspectRatio=True)
                pdf.rect(x, y, img_width, img_height)

                pdf.setFont("JMHWulfilaNew", 16)
                text = f"Image {idx + 1}"
                text_x = x + img_width/2 - pdf.stringWidth(text, "JMHWulfilaNew", 16)/2
                text_y = y - 20
                pdf.drawString(text_x, text_y, text)

                os.remove(temp_image_path)

        # Add title
        pdf.setFont("JMHWulfilaNew", 24)
        title = "Image Collection"
        title_width = pdf.stringWidth(title, "JMHWulfilaNew", 24)
        pdf.drawString((width - title_width) / 2, height - 0.5*inch, title)

        pdf.save()
        os.remove(font_path)

2. Complete Data Analysis Pipeline

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
import xgboost as xgb
import joblib

# Load and preprocess data
def process_data(input_file):
    # Read data
    df = pd.read_csv(input_file)

    # Basic cleaning
    df = df.dropna()
    df = df.drop_duplicates()

    # Feature engineering
    df['new_feature'] = df['feature1'] / df['feature2']
    df['category'] = pd.cut(df['value'], bins=5, labels=['A','B','C','D','E'])

    # Save processed data
    df.to_csv('processed_data.csv', index=False)
    return df

# Create visualizations
def create_visualizations(df):
    # Set style
    plt.style.use('seaborn')

    # Figure 1: Distribution plots
    fig, axes = plt.subplots(2, 2, figsize=(15, 15))
    for idx, col in enumerate(df.select_dtypes(include=[np.number]).columns):
        if idx < 4:
            row = idx // 2
            col_idx = idx % 2
            sns.histplot(data=df, x=col, ax=axes[row, col_idx])
            axes[row, col_idx].set_title(f'{col} Distribution')
    plt.tight_layout()
    plt.savefig('distributions.png')

    # Figure 2: Correlation matrix
    plt.figure(figsize=(10, 8))
    sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
    plt.title('Correlation Matrix')
    plt.savefig('correlation.png')

# Train models
def train_models(df, target_col):
    # Prepare data
    X = df.drop(target_col, axis=1)
    y = df[target_col]
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

    # Random Forest
    rf_model = RandomForestClassifier()
    rf_model.fit(X_train, y_train)
    rf_pred = rf_model.predict(X_test)

    # XGBoost
    xgb_model = xgb.XGBClassifier()
    xgb_model.fit(X_train, y_train)
    xgb_pred = xgb_model.predict(X_test)

    # Save models
    joblib.dump(rf_model, 'random_forest.joblib')
    joblib.dump(xgb_model, 'xgboost.joblib')

    # Generate reports
    with open('model_report.txt', 'w') as f:
        f.write("Random Forest Report:\n")
        f.write(classification_report(y_test, rf_pred))
        f.write("\nXGBoost Report:\n")
        f.write(classification_report(y_test, xgb_pred))

# Main execution
if input_file:
    df = process_data(input_file)
    create_visualizations(df)
    train_models(df, 'target_column')

3. Multimedia Processing Pipeline

from moviepy.editor import VideoFileClip, AudioFileClip
import cv2
import soundfile as sf
import numpy as np
from PIL import Image
import os
from pydub import AudioSegment
import matplotlib.pyplot as plt

def process_media(input_file):
    if str(input_file).endswith(('.mp4', '.avi', '.mov')):
        # Video processing
        video = VideoFileClip(str(input_file))

        # Extract audio
        audio = video.audio
        audio.write_audiofile('extracted_audio.mp3')

        # Process frames
        frame = cv2.imread(str(input_file))
        if frame is not None:
            # Apply various opencv operations
            gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
            edges = cv2.Canny(gray, 100, 200)
            cv2.imwrite('edges.jpg', edges)

            # Create thumbnail
            pil_image = Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
            pil_image.thumbnail((200, 200))
            pil_image.save('thumbnail.jpg')

    elif str(input_file).endswith(('.mp3', '.wav')):
        # Audio processing using pydub
        audio = AudioSegment.from_file(str(input_file))

        # Basic audio manipulations
        # Increase volume by 6dB
        louder = audio + 6
        louder.export('louder.mp3', format='mp3')

        # Reduce volume by 6dB
        quieter = audio - 6
        quieter.export('quieter.mp3', format='mp3')

        # Reverse audio
        reversed_audio = audio.reverse()
        reversed_audio.export('reversed.mp3', format='mp3')

        # Create visualization using raw data
        samples = np.array(audio.get_array_of_samples())

        plt.figure(figsize=(12, 8))
        plt.plot(samples)
        plt.title('Waveform')
        plt.xlabel('Sample')
        plt.ylabel('Amplitude')
        plt.savefig('waveform.png')
        plt.close()

        # Export processed audio
        audio.export('processed_audio.wav', format='wav')

# Example usage
if input_file:
    process_media(input_file)

Best Practices and Tips

Error Handling

Always use proper error handling:

try:
    # Your code here
    if not os.path.exists(input_file):
        raise ValueError("Input file not found")
    if not str(input_file).endswith(('.jpg', '.png')):
        raise ValueError("Unsupported file format")
except Exception as e:
    raise ValueError(f"Error occurred: {str(e)}")

File Management

# Good practice for file handling
def safe_file_handling():
    # Use context managers
    with open(input_file, 'rb') as file:
        data = file.read()

    # Always use output directory
    output_path = os.path.join('result.txt')

Resource Management

# Good practice for resource management
def manage_resources():
    # Release resources explicitly
    video = cv2.VideoCapture(str(input_file))
    try:
        # Process video
        pass
    finally:
        video.release()

    # Clear large variables
    del video
    import gc
    gc.collect()

Memory Optimization

# Good practice for memory optimization
def process_large_data():
    # Use generators for large datasets
    def data_generator():
        with open(input_file, 'r') as f:
            for line in f:
                yield process_line(line)

    # Process in chunks
    chunk_size = 1000
    for chunk in pd.read_csv(input_file, chunksize=chunk_size):
        process_chunk(chunk)

Limitations and Considerations

  1. All files must be saved to the ‘output’ directory, if you are outputting a file, you do not need to specify ‘output/filename.file’, just ‘filename.file’
  2. Handle large files appropriately using chunks or generators
  3. Consider memory usage when processing large datasets or media files, maximum RAM 8 gigabytes
  4. Use appropriate error handling for different file types and operations

Complete List of Available Libraries

Core Tools and Utilities

setuptools (68.2.0) - Package build and installation tools wheel (0.41.2) - Built-package format for Python pip (23.2.1) - Package installer for Python tqdm (4.66.5) - Progress bar and progress metrics decorator (4.4.2) - Tools for creating function decorators future (1.0.0) - Python 2/3 compatibility utilities python-dateutil (2.8.2) - Extensions to Python datetime module pytz (2023.3) - Time zone database and utilities joblib (1.3.2) - Tools for lightweight pipelining psutil (5.9.5) - System and process utilities colorama (0.4.6) - Cross-platform colored terminal text proglog (0.1.10) - Progress logging utility python-magic (0.4.27) - File type identification library

Data Processing

numpy (2.1.2) - Numerical computing and array operations pandas (2.0.3) - Data manipulation and analysis scipy (1.14.1) - Scientific computing tools

Image Processing

Pillow (11.0.0) - Python Imaging Library opencv-python (4.10.0.84) - Computer vision and image processing pillow-heif (0.13.0) - HEIF image format support scikit-image (0.24.0) - Image processing algorithms imageio (2.36.0) - Image I/O library imageio-ffmpeg (0.5.1) - FFMPEG plugin for imageio py360convert (0.1.0) - 360-degree image conversion

Document Processing

python-docx (0.8.11) - Microsoft Word documents creation/editing python-pptx (0.6.21) - PowerPoint presentation manipulation PyPDF2 (3.0.1) - PDF file manipulation reportlab (4.0.4) - PDF generation markdown (3.4.4) - Markdown to HTML conversion pdf2image (1.16.3) - PDF to image conversion

Graphics and Visualization

pycairo (1.24.0) - Cairo graphics library bindings cairocffi (1.6.1) - Cairo graphics alternative binding svgwrite (1.4.3) - SVG file creation svglib (1.5.1) - SVG file parsing and conversion cairosvg (2.7.1) - SVG to PNG/PDF conversion matplotlib (3.7.2) - Plotting and visualization seaborn (0.12.2) - Statistical data visualization plotly (5.16.1) - Interactive visualizations

Video and Audio

ffmpeg-python (0.2.0) - FFmpeg command line wrapper moviepy (1.0.3) - Video editing soundfile (0.12.1) - Audio file reading/writing pydub (0.25.1) - Audio processing

Machine Learning

scikit-learn (1.3.0) - Machine learning algorithms xgboost (2.0.0) - Gradient boosting framework lightgbm (4.0.0) - Gradient boosting framework wand (0.6.11) - ImageMagick binding

Web and Networking

requests (2.32.3) - HTTP library requests-html (0.10.0) - HTML parsing and JavaScript rendering beautifulsoup4 (4.12.2) - HTML/XML parsing lxml (4.9.3) - XML and HTML processing urllib3 (2.2.3) - HTTP client library certifi (2024.8.30) - SSL/TLS certificates charset-normalizer (3.4.0) - Character encoding detection idna (3.10) - International Domain Names handling

Archive Handling

py7zr (0.20.5) - 7-zip file handling patool (1.12) - Archive file handling

OCR and Text Processing

pytesseract (0.3.10) - OCR engine interface