Search

Building Python Packages with C++ Extensions: A Complete Guide

Sep 13, 2025

17 min read

PythonC++pybind11CMakeData Structures
Building Python Packages with C++ Extensions: A Complete Guide

Introduction

The complete project repository is available here.

Building Python packages with C++ extensions is a powerful way to combine Python’s ease of use with C++‘s performance. This guide walks through creating a complete Python package with C++ backend, covering everything from project structure to PyPI publishing.

We’ll use a Count-Min Sketch implementation as our example - a probabilistic data structure perfect for streaming data analysis. But the techniques apply to any C++ library you want to expose to Python.

Why Build Python Packages with C++?

  • Multithreading Performance: C++ atomic operations bypass Python’s GIL limitations, enabling true parallel processing
  • Lower-Level Control: Direct memory management and hardware-level optimizations
  • Existing Libraries: Leverage existing C++ libraries in Python projects
  • System Integration: Access low-level system APIs and hardware features
  • Memory Efficiency: Better control over memory usage and data structures

What You’ll Learn

  • Complete project structure for Python packages with C++ extensions
  • How to configure pyproject.toml for modern Python packaging
  • CMake setup for cross-platform C++ builds
  • pybind11 integration for seamless Python bindings
  • Development workflow and testing strategies
  • CI/CD pipeline for automated building and publishing

Project Structure

Understanding the project structure is crucial for Python packages with C++ extensions. Here’s the complete layout:

count-min-sketch/
├── include/cmsketch/ # C++ header files
│ ├── cmsketch.h # Main header (include this)
│ ├── count_min_sketch.h # Core template class
│ └── hash_util.h # Hash utility functions
├── src/cmsketchcpp/ # C++ source files
│ └── count_min_sketch.cc # Core implementation
├── src/cmsketch/ # Python package source
│ ├── __init__.py # Package initialization
│ ├── base.py # Base classes and interfaces
│ ├── _core.pyi # Type stubs for C++ bindings
│ ├── _version.py # Version information
│ ├── py.typed # Type checking marker
│ └── py/ # Pure Python implementations
│ ├── count_min_sketch.py # Python Count-Min Sketch implementation
│ └── hash_util.py # Python hash utilities
├── src/ # Additional source files
│ ├── main.cc # Example C++ application
│ └── python_bindings.cc # Python bindings (pybind11)
├── tests/ # C++ unit tests
│ ├── CMakeLists.txt # Test configuration
│ ├── test_count_min_sketch.cc # Core functionality tests
│ ├── test_hash_functions.cc # Hash function tests
│ └── test_sketch_config.cc # Configuration tests
├── pytests/ # Python tests
│ ├── __init__.py # Test package init
│ ├── conftest.py # Pytest configuration
│ ├── test_count_min_sketch.py # Core Python tests
│ ├── test_hash_util.py # Hash utility tests
│ ├── test_mixins.py # Mixin class tests
│ └── test_py_count_min_sketch.py # Pure Python implementation tests
├── benchmarks/ # Performance benchmarks
│ ├── __init__.py # Benchmark package init
│ ├── generate_data.py # Data generation utilities
│ └── test_benchmarks.py # Benchmark validation tests
├── examples/ # Example scripts
│ └── example.py # Python usage example
├── scripts/ # Build and deployment scripts
│ ├── build.sh # Production build script
│ └── build-dev.sh # Development build script
├── data/ # Sample data files
│ ├── ips.txt # IP address sample data
│ └── unique-ips.txt # Unique IP sample data
├── build/ # Build artifacts (generated)
│ ├── _core.cpython-*.so # Compiled Python extensions
│ ├── cmsketch_example # Compiled C++ example
│ ├── libcmsketch.a # Static library
│ └── tests/ # Compiled test binaries
├── dist/ # Distribution packages (generated)
│ └── cmsketch-*.whl # Python wheel packages
├── CMakeLists.txt # Main CMake configuration
├── pyproject.toml # Python package configuration
├── uv.lock # uv lock file
├── Makefile # Convenience make targets
├── LICENSE # MIT License
└── README.md # This file

Key Directory Purposes

  • include/: C++ header files that define the public API
  • src/cmsketchcpp/: C++ implementation files
  • src/cmsketch/: Python package source code
  • src/: Additional C++ files like bindings and examples
  • tests/: C++ unit tests using Google Test
  • pytests/: Python tests using pytest
  • benchmarks/: Performance testing and comparison
  • build/: Generated build artifacts (not in version control)
  • dist/: Generated distribution packages (not in version control)

Version Management with bump-my-version

Managing versions across multiple files (Python package, C++ library, documentation) can be challenging. This project uses bump-my-version to automate version updates across all relevant files.

Configuration

The version management is configured in .bumpversion.toml:

.bumpversion.toml
[bumpversion]
current_version = "0.1.10"
commit = true
tag = true
tag_name = "v{new_version}"
message = "Bump version: {current_version} → {new_version}"
[bumpversion:file:pyproject.toml]
search = 'version = "{current_version}"'
replace = 'version = "{new_version}"'
[bumpversion:file:CMakeLists.txt]
search = 'VERSION {current_version} # Project version'
replace = 'VERSION {new_version} # Project version'
[bumpversion:file:VERSION]
search = '{current_version}'
replace = '{new_version}'

The CMakeLists.txt Trick

To make bump-my-version work with CMakeLists.txt, I use a clever trick by adding a comment:

CMakeLists.txt
project(
cmsketch
VERSION 0.1.10 # Project version
LANGUAGES CXX)

The comment # Project version helps bump-my-version identify the correct version line in CMakeLists.txt. This ensures that other occurrences of strings like VERSION x.x.x elsewhere in the file are not mistaken for the actual project version.

Usage

Terminal window
# Install bump-my-version
uv add --dev bump-my-version
# Bump patch version (0.1.10 → 0.1.11)
uv run bump-my-version patch
# Bump minor version (0.1.10 → 0.2.0)
uv run bump-my-version minor
# Bump major version (0.1.10 → 1.0.0)
uv run bump-my-version major
# Preview changes without committing
uv run bump-my-version --dry-run patch

What Gets Updated

When you run bump-my-version, it automatically updates:

  • pyproject.toml: Python package version
  • CMakeLists.txt: C++ project version
  • VERSION: Standalone version file
  • Git commit: Creates a commit with the version bump
  • Git tag: Creates a tag like v0.1.11

This ensures all version references stay synchronized across your entire project.

pyproject.toml Configuration

The pyproject.toml file is the heart of modern Python packaging. Here’s how to configure it for C++ extensions:

pyproject.toml
[build-system]
requires = ["scikit-build-core>=0.10", "pybind11", "cmake>=3.15"]
build-backend = "scikit_build_core.build"
[project]
name = "cmsketch"
version = "0.1.10"
description = "High-performance Count-Min Sketch implementation with C++ and Python versions"
readme = "README.md"
license = { file = "LICENSE" }
authors = [{ name = "isaac-fei", email = "isaac.omega.fei@gmail.com" }]
maintainers = [{ name = "isaac-fei", email = "isaac.omega.fei@gmail.com" }]
requires-python = ">=3.11"
classifiers = [
"Development Status :: 4 - Beta",
"Intended Audience :: Developers",
"License :: OSI Approved :: MIT License",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Programming Language :: C++",
"Topic :: Scientific/Engineering",
"Topic :: Software Development :: Libraries :: Python Modules",
"Operating System :: OS Independent",
]
keywords = ["count-min-sketch", "probabilistic", "data-structure", "streaming"]
[project.urls]
Homepage = "https://github.com/isaac-fate/count-min-sketch"
Repository = "https://github.com/isaac-fate/count-min-sketch"
Documentation = "https://github.com/isaac-fate/count-min-sketch#readme"
Issues = "https://github.com/isaac-fate/count-min-sketch/issues"
[project.optional-dependencies]
dev = ["pytest>=8.0.0", "pytest-benchmark>=4.0.0", "build>=1.0.0"]
[tool.scikit-build]
build-dir = "build/{wheel_tag}"
wheel.exclude = ["lib/**", "include/**"]
[tool.scikit-build.cmake]
args = [
"-DCMAKE_BUILD_TYPE=Release",
"-DCMAKE_CXX_STANDARD=17",
"-DCMAKE_CXX_STANDARD_REQUIRED=ON",
"-DCMAKE_CXX_EXTENSIONS=OFF",
]
[tool.cibuildwheel]
build = "cp311-* cp312-*"
skip = "*-win32 *-manylinux_i686 *-musllinux*"
test-command = "python -m pytest {project}/pytests -v"
test-requires = "pytest"
manylinux-x86_64-image = "manylinux_2_28"
[tool.cibuildwheel.macos]
environment = { MACOSX_DEPLOYMENT_TARGET = "10.15" }
[tool.cibuildwheel.windows]
before-build = "pip install delvewheel"
repair-wheel-command = "delvewheel repair -w {dest_dir} {wheel}"
[tool.pytest.ini_options]
testpaths = ["pytests"]
python_files = ["test_*.py"]
python_classes = ["Test*"]
python_functions = ["test_*"]
addopts = ["-v", "--tb=short"]

Key Configuration Sections

[build-system]: Specifies the build backend and requirements

  • scikit-build-core: Modern build system for C++ extensions
  • pybind11: C++ to Python binding library
  • cmake: C++ build system

[project]: Package metadata and dependencies

  • Standard Python package information
  • requires-python: Minimum Python version
  • classifiers: PyPI categorization

[tool.scikit-build]: Build configuration

  • build-dir: Where to place build artifacts
  • wheel.exclude: Files to exclude from wheels

[tool.scikit-build.cmake]: CMake arguments

  • C++ standard and build type settings
  • Cross-platform compilation flags

[tool.cibuildwheel]: CI/CD wheel building

  • Python versions and platforms to build for
  • Platform-specific configurations

CMakeLists.txt Configuration

The CMakeLists.txt file orchestrates the C++ build process and Python binding generation:

CMakeLists.txt
cmake_minimum_required(VERSION 3.15)
project(
cmsketch
VERSION 0.1.10 # Project version
LANGUAGES CXX)
# Generate compile_commands.json for IDE support
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
# Build options
option(DEVELOPMENT_MODE "Enable development mode with IDE support" OFF)
option(BUILD_PYTHON_BINDINGS "Build Python bindings for development" OFF)
# C++ standard - use C++17 for better compatibility
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_EXTENSIONS OFF)
# Default build type
if(NOT CMAKE_BUILD_TYPE)
set(CMAKE_BUILD_TYPE
Release
CACHE STRING "Build type" FORCE)
endif()
# Compiler warnings
if(MSVC)
add_compile_options(/W4)
# Enable Windows symbol export
set(CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS ON)
else()
add_compile_options(-Wall -Wextra -Wpedantic)
# Enable position independent code for shared libraries
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
endif()
# Platform-specific settings
if(APPLE)
set(CMAKE_OSX_DEPLOYMENT_TARGET
"10.9"
CACHE STRING "Minimum OS X deployment version")
set(CMAKE_OSX_ARCHITECTURES
"x86_64;arm64"
CACHE STRING "Build architectures for OS X")
endif()
# Source files
file(GLOB_RECURSE CMSKETCH_SOURCES "src/cmsketchcpp/*.cc")
# Create library
add_library(cmsketch ${CMSKETCH_SOURCES})
target_include_directories(cmsketch PUBLIC include)
target_compile_features(cmsketch PUBLIC cxx_std_17)
# Example executable
file(GLOB EXAMPLE_SOURCES "src/main.cc")
add_executable(cmsketch_example ${EXAMPLE_SOURCES})
target_link_libraries(cmsketch_example PRIVATE cmsketch)
# Install targets
install(TARGETS cmsketch DESTINATION lib)
install(DIRECTORY include/ DESTINATION include)
# Python bindings
if(SKBUILD_PROJECT_NAME
OR BUILD_PYTHON_BINDINGS
OR DEVELOPMENT_MODE)
set(PYBIND11_FINDPYTHON ON)
find_package(pybind11 REQUIRED)
pybind11_add_module(_core MODULE src/python_bindings.cc)
target_link_libraries(_core PRIVATE cmsketch)
if(SKBUILD_PROJECT_NAME)
install(TARGETS _core DESTINATION ${SKBUILD_PROJECT_NAME})
endif()
endif()
# Testing
option(BUILD_TESTS "Build tests" OFF)
if(BUILD_TESTS OR DEVELOPMENT_MODE)
find_package(GTest REQUIRED)
enable_testing()
add_subdirectory(tests)
endif()

Key CMake Sections

Project Setup: Basic project configuration and C++ standard Compiler Settings: Platform-specific compiler flags and warnings Library Creation: Building the core C++ library Python Bindings: pybind11 integration for Python extensions Testing: Google Test integration for C++ unit tests Installation: Target installation for packaging

Python Bindings with pybind11

The Python bindings are created in src/python_bindings.cc:

src/python_bindings.cc
#include "cmsketch/cmsketch.h"
#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
namespace py = pybind11;
// Macro to define common CountMinSketch methods for a given type
#define DEFINE_COUNT_MIN_SKETCH_METHODS(class_type, class_name) \
py::class_<cmsketch::CountMinSketch<class_type>>(m, class_name) \
.def(py::init<uint32_t, uint32_t>(), py::arg("width"), py::arg("depth"), \
"Create a Count-Min Sketch with specified dimensions") \
.def("insert", &cmsketch::CountMinSketch<class_type>::Insert, \
py::arg("item"), "Insert an item into the sketch") \
.def("count", &cmsketch::CountMinSketch<class_type>::Count, \
py::arg("item"), "Get the estimated count of an item") \
.def("clear", &cmsketch::CountMinSketch<class_type>::Clear, \
"Reset the sketch to initial state") \
.def("merge", &cmsketch::CountMinSketch<class_type>::Merge, \
py::arg("other"), "Merge another sketch into this one") \
.def("top_k", &cmsketch::CountMinSketch<class_type>::TopK, py::arg("k"), \
py::arg("candidates"), "Get the top k items from candidates") \
.def("get_width", &cmsketch::CountMinSketch<class_type>::GetWidth, \
"Get the width of the sketch") \
.def("get_depth", &cmsketch::CountMinSketch<class_type>::GetDepth, \
"Get the depth of the sketch")
PYBIND11_MODULE(_core, m) {
m.doc() = "Count-Min Sketch implementation with Python bindings";
// CountMinSketch class for strings
DEFINE_COUNT_MIN_SKETCH_METHODS(std::string, "CountMinSketchStr");
// CountMinSketch class for int
DEFINE_COUNT_MIN_SKETCH_METHODS(int, "CountMinSketchInt");
}

Key pybind11 Features

  • Automatic Type Conversion: STL containers are automatically converted
  • Method Binding: C++ methods become Python methods
  • Documentation: Docstrings are automatically generated
  • Template Specialization: Different types get separate Python classes

C++ Atomic Implementation

The core advantage of this C++ implementation is its use of atomic operations for thread safety, which bypasses Python’s Global Interpreter Lock (GIL). Here’s how the atomic implementation works:

Header file (include/cmsketch/count_min_sketch.h):

include/cmsketch/count_min_sketch.h
template<typename KeyType>
class CountMinSketch {
private:
// 2D array of atomic counters for thread-safe operations
std::vector<std::vector<std::atomic<size_t>>> counters_;
std::vector<std::function<size_t(const KeyType&)>> hash_functions_;
size_t width_;
size_t depth_;
public:
void Insert(const KeyType& key);
size_t Count(const KeyType& key) const;
// ... other method declarations
};

Implementation file (src/cmsketchcpp/count_min_sketch.cc):

src/cmsketchcpp/count_min_sketch.cc
template<typename KeyType>
void CountMinSketch<KeyType>::Insert(const KeyType& key) {
for (size_t i = 0; i < depth_; ++i) {
size_t hash_value = hash_functions_[i](key);
size_t index = hash_value % width_;
// Atomic increment - thread-safe without locks
counters_[i][index].fetch_add(1, std::memory_order_relaxed);
}
}
template<typename KeyType>
size_t CountMinSketch<KeyType>::Count(const KeyType& key) const {
size_t min_count = std::numeric_limits<size_t>::max();
for (size_t i = 0; i < depth_; ++i) {
size_t hash_value = hash_functions_[i](key);
size_t index = hash_value % width_;
// Atomic read - thread-safe without locks
size_t count = counters_[i][index].load(std::memory_order_relaxed);
min_count = std::min(min_count, count);
}
return min_count;
}

Key Atomic Features

Memory Ordering: Using std::memory_order_relaxed for optimal performance

  • No synchronization overhead: Relaxed ordering is sufficient for counters
  • Hardware optimization: Allows CPU to reorder operations for better performance
  • Cache efficiency: Reduces memory barrier overhead

Thread Safety Benefits:

  • Lock-free design: No mutexes or locks required
  • Concurrent access: Multiple threads can insert/query simultaneously
  • GIL bypass: C++ threads operate independently of Python’s GIL
  • Scalability: Performance scales with number of CPU cores

Performance Comparison:

pytests/test_py_count_min_sketch.py
# Python implementation (GIL limited)
def insert_python(self, key):
with self.lock: # Serialized access
# ... increment counters
src/cmsketchcpp/count_min_sketch.cc
// C++ implementation (atomic operations)
void CountMinSketch<KeyType>::Insert(const KeyType& key) {
// Parallel access - no locks needed
counters_[i][index].fetch_add(1, std::memory_order_relaxed);
}

This atomic implementation enables true parallel processing where multiple threads can simultaneously insert and query the sketch without blocking each other, providing significant performance advantages in multithreaded environments.

Development Workflow

Here’s the complete development workflow for building Python packages with C++ extensions:

1. Initial Setup

Terminal window
# Create project directory
mkdir my-python-cpp-package
cd my-python-cpp-package
# Initialize git repository
git init
# Create basic directory structure
mkdir -p include/mypackage src/mypackagecpp src/mypackage/py tests pytests examples

2. Development Environment

Terminal window
# Install development dependencies
uv sync --dev
# Build in development mode
uv run python -m pip install -e .
# Run tests
uv run pytest pytests/
make build-dev && cd build && make test

3. File Associations

Understanding how files relate to the project structure:

C++ Headers (include/cmsketch/):

  • cmsketch.h → Main header included by users
  • count_min_sketch.h → Core template class definition
  • hash_util.h → Utility functions

C++ Implementation (src/cmsketchcpp/):

  • count_min_sketch.cc → Template class implementation
  • Links to headers via #include "cmsketch/cmsketch.h"

Python Package (src/cmsketch/):

  • __init__.py → Package initialization and public API
  • _core.pyi → Type stubs for C++ bindings
  • base.py → Abstract base classes
  • py/ → Pure Python implementations

Python Bindings (src/python_bindings.cc):

  • Links C++ library to Python via pybind11
  • Creates _core module with CountMinSketchStr and CountMinSketchInt classes

Build Configuration:

  • pyproject.toml → Python package metadata and build settings
  • CMakeLists.txt → C++ build configuration and pybind11 integration

4. Build Process

The build process follows this sequence:

  1. CMake Configuration: Reads CMakeLists.txt and configures build
  2. C++ Compilation: Compiles C++ source files into library
  3. pybind11 Binding: Generates Python extension module
  4. Python Packaging: Creates wheel with both C++ library and Python bindings

5. Testing Strategy

C++ Tests (tests/):

tests/test_count_min_sketch.cc
#include <gtest/gtest.h>
#include "cmsketch/cmsketch.h"
TEST(CountMinSketchTest, BasicFunctionality) {
cmsketch::CountMinSketch<std::string> sketch(100, 3);
sketch.Insert("test");
EXPECT_EQ(sketch.Count("test"), 1);
}

Python Tests (pytests/):

pytests/test_count_min_sketch.py
import pytest
import cmsketch
def test_basic_functionality():
sketch = cmsketch.CountMinSketchStr(100, 3)
sketch.insert("test")
assert sketch.count("test") == 1

6. CI/CD Pipeline

The project uses GitHub Actions for automated building and testing:

Test Workflow (.github/workflows/test.yml):

  • Runs on push/PR
  • Tests C++ and Python code
  • Cross-platform testing (Windows, Linux, macOS)

Wheel Building (.github/workflows/wheels.yml):

  • Uses cibuildwheel for cross-platform wheel generation
  • Builds for multiple Python versions and architectures
  • Tests wheels before publishing

Release Workflow (.github/workflows/release.yml):

  • Triggers on git tags
  • Publishes wheels to PyPI
  • Creates GitHub releases

C++ Implementation

The core implementation uses a template-based design that supports any hashable key type:

src/main.cc
#include "cmsketch/cmsketch.h"
#include <iostream>
int main() {
// Create a sketch with width=1000, depth=5
cmsketch::CountMinSketch<std::string> sketch(1000, 5);
// Add elements
sketch.Insert("apple");
sketch.Insert("apple");
sketch.Insert("banana");
// Query frequencies
std::cout << "apple: " << sketch.Count("apple") << std::endl; // 2
std::cout << "banana: " << sketch.Count("banana") << std::endl; // 1
std::cout << "cherry: " << sketch.Count("cherry") << std::endl; // 0
return 0;
}

The implementation uses multiple hash functions to distribute items across the counter array, providing probabilistic guarantees on estimation accuracy.

Template Design

The template-based approach allows for type-safe implementations:

include/cmsketch/count_min_sketch.h
template<typename KeyType>
class CountMinSketch {
public:
CountMinSketch(size_t width, size_t depth);
void Insert(const KeyType& key);
size_t Count(const KeyType& key) const;
std::vector<std::pair<KeyType, size_t>> TopK(size_t k,
const std::vector<KeyType>& candidates) const;
void Merge(const CountMinSketch& other);
void Clear();
private:
std::vector<std::vector<std::atomic<size_t>>> counters_;
std::vector<std::function<size_t(const KeyType&)>> hash_functions_;
size_t width_;
size_t depth_;
};

This design ensures type safety while maintaining high performance through template specialization.

Python Usage

The Python interface provides a clean, easy-to-use API:

examples/example.py
import cmsketch
# Create a sketch for strings
sketch = cmsketch.CountMinSketchStr(1000, 5)
# Add elements
sketch.insert("apple")
sketch.insert("apple")
sketch.insert("banana")
# Query frequencies
print(f"apple: {sketch.count('apple')}") # 2
print(f"banana: {sketch.count('banana')}") # 1
print(f"cherry: {sketch.count('cherry')}") # 0
# Get top-k items
candidates = ["apple", "banana", "cherry"]
top_k = sketch.top_k(2, candidates)
for item, count in top_k:
print(f"{item}: {count}")

Type Support

The library provides specialized classes for different data types:

  • CountMinSketchStr: String-based sketch
  • CountMinSketchInt: Integer-based sketch

This approach optimizes performance for common use cases while maintaining the flexibility of the underlying C++ implementation.

Performance Benchmarks

The C++ implementation provides significant performance improvements over Python, especially in multithreaded environments. Here are the actual benchmark results from our test suite:

Benchmark Setup

The benchmark suite tests real-world scenarios with:

  • 100,000 IP address samples generated using Faker with weighted distribution
  • Realistic frequency patterns (most frequent IP appears ~10% of the time)
  • Threaded processing with 10 concurrent workers and 1,000-item batches
  • Comprehensive testing across insert, count, top-k, and streaming operations

Actual Benchmark Results

Insert Performance (100k items, threaded):

  • C++: 45.79ms (21.84 ops/sec)
  • Python: 8,751.15ms (0.11 ops/sec)
  • Speedup: 191x faster with C++

Count Performance (querying unique items):

  • C++: 4.71μs per query (212,130 ops/sec)
  • Python: 858.58μs per query (1,165 ops/sec)
  • Speedup: 182x faster with C++

Top-K Performance (finding top items):

  • C++: 2.57μs per operation (389,163 ops/sec)
  • Python: 857.54μs per operation (1,166 ops/sec)
  • Speedup: 334x faster with C++

Streaming Performance (insert + top-k):

  • C++: 46.03ms (21.72 ops/sec)
  • Python: 8,889.81ms (0.11 ops/sec)
  • Speedup: 193x faster with C++

Performance Analysis

OperationC++ TimePython TimeSpeedupKey Advantage
Insert (100k threaded)45.79ms8,751.15ms191xGIL bypass + atomic operations
Count (per query)4.71μs858.58μs182xDirect memory access
Top-K (per operation)2.57μs857.54μs334xOptimized algorithms
Streaming (end-to-end)46.03ms8,889.81ms193xCombined benefits

Running Benchmarks

Terminal window
# Run all benchmarks with pytest
uv run pytest ./benchmarks
# Run specific benchmark categories
uv run pytest ./benchmarks -k "insert"
uv run pytest ./benchmarks -k "count"
uv run pytest ./benchmarks -k "topk"
# Run with verbose output
uv run pytest ./benchmarks -v
# Generate test data (if needed)
uv run python ./benchmarks/generate_data.py

The benchmark suite uses pytest-benchmark for reliable measurements and includes both synthetic and real-world data patterns.

Why C++ is So Much Faster

1. GIL Bypass in Multithreaded Operations

  • Python: GIL serializes all operations, even with threading
  • C++: Atomic operations allow true parallel processing
  • Result: 191x speedup in threaded insertions

2. Memory Access Patterns

  • Python: Object overhead, dynamic typing, garbage collection
  • C++: Direct memory access, contiguous arrays, no GC overhead
  • Result: 182x speedup in count operations

3. Algorithm Optimization

  • Python: Interpreted bytecode, dynamic dispatch
  • C++: Compiled machine code, template specialization
  • Result: 334x speedup in top-k operations

4. Thread Safety Implementation

# Python: Lock-based (serialized)
def insert_python(self, key):
with self.lock: # All threads wait here
# ... increment counters
// C++: Atomic operations (parallel)
void CountMinSketch<KeyType>::Insert(const KeyType& key) {
// All threads can execute simultaneously
counters_[i][index].fetch_add(1, std::memory_order_relaxed);
}

5. Memory Efficiency

  • Python: ~8 bytes per integer + object overhead
  • C++: 4 bytes per atomic counter
  • Result: 2-3x less memory usage

Project Architecture

The project demonstrates modern software engineering practices:

Build System

  • CMake: Cross-platform C++ build configuration
  • scikit-build-core: Modern Python build system for C++ extensions
  • pybind11: Seamless C++ to Python binding generation
  • uv: Fast, modern Python package management

Project Structure

count-min-sketch/
├── include/cmsketch/ # C++ header files
│ ├── cmsketch.h # Main header
│ ├── count_min_sketch.h # Core template class
│ └── hash_util.h # Hash utilities
├── src/cmsketchcpp/ # C++ source files
│ └── count_min_sketch.cc # Core implementation
├── src/cmsketch/ # Python package
│ ├── __init__.py # Package initialization
│ ├── _core.pyi # Type stubs
│ └── py/ # Pure Python implementations
├── tests/ # C++ unit tests
├── pytests/ # Python tests
├── benchmarks/ # Performance benchmarks
└── examples/ # Example scripts

CI/CD Pipeline

The project uses GitHub Actions for automated testing and publishing:

  • Cross-Platform Testing: Windows, Linux, macOS
  • Wheel Building: Automated wheel generation for all platforms
  • PyPI Publishing: Automatic package distribution on release

Educational Value

This project demonstrates several important software engineering concepts:

1. Python Package Development with C++ Extensions

  • pybind11 Integration: Seamless C++ to Python binding generation
  • Type Stubs: Complete type information for Python IDEs
  • Modern Build Tools: scikit-build-core and uv for package management

2. Performance Engineering

  • C++ vs Python: Direct performance comparison between implementations
  • Memory Efficiency: Optimized data structures and memory usage patterns
  • Thread Safety: Atomic operations and concurrent access patterns

3. Build System Integration

  • CMake: Cross-platform C++ build configuration
  • Python Packaging: Complete pip-installable package creation
  • CI/CD: Automated testing and publishing workflows

4. Modern C++ Practices

  • Template Metaprogramming: Generic, type-safe implementations
  • RAII: Resource management and exception safety
  • STL Integration: Standard library containers and algorithms

Getting Started

Installation

Terminal window
# Using pip
pip install cmsketch
# Using uv (recommended)
uv add cmsketch

Basic Usage

import cmsketch
# Create a sketch
sketch = cmsketch.CountMinSketchStr(1000, 5)
# Add elements
sketch.insert("apple")
sketch.insert("apple")
sketch.insert("banana")
# Query frequencies
print(f"apple: {sketch.count('apple')}") # 2
print(f"banana: {sketch.count('banana')}") # 1

Building from Source

Terminal window
# Clone the repository
git clone https://github.com/isaac-fate/count-min-sketch.git
cd count-min-sketch
# Build everything
make build
# Run tests
make test
# Run example
make example

Key Takeaways

Building Python packages with C++ extensions requires understanding several interconnected systems:

1. Project Structure

  • Clear separation between C++ headers, implementation, and Python bindings
  • Logical organization that scales from simple to complex projects
  • Build artifact management to keep source control clean

2. Build System Integration

  • pyproject.toml for modern Python packaging standards
  • CMakeLists.txt for cross-platform C++ compilation
  • pybind11 for seamless C++ to Python binding generation

3. Development Workflow

  • Incremental development with hot reloading during development
  • Comprehensive testing at both C++ and Python levels
  • CI/CD automation for cross-platform wheel building and publishing

4. Performance Benefits

  • 191x speedup in threaded insertions (GIL bypass)
  • 182x speedup in count operations (direct memory access)
  • 334x speedup in top-k operations (compiled optimization)
  • Atomic operations enable true parallel processing without locks
  • Memory efficiency through direct C++ data structure control

Next Steps

To apply these techniques to your own projects:

  1. Start Simple: Begin with a basic C++ function and Python binding
  2. Iterate Gradually: Add complexity incrementally (templates, STL containers, etc.)
  3. Test Thoroughly: Implement both C++ and Python test suites
  4. Automate Everything: Set up CI/CD for automated building and testing
  5. Document Well: Provide clear examples and API documentation

The complete source code, documentation, and benchmarks are available on GitHub, and the package is available on PyPI for immediate use.

This approach to Python package development with C++ extensions provides a solid foundation for building high-performance libraries that combine the best of both worlds: Python’s ease of use and C++‘s performance.

Comments 💬