Building Python Packages with C++ Extensions: A Complete Guide
Sep 13, 2025
17 min read

Introduction
The complete project repository is available here.
Building Python packages with C++ extensions is a powerful way to combine Python’s ease of use with C++‘s performance. This guide walks through creating a complete Python package with C++ backend, covering everything from project structure to PyPI publishing.
We’ll use a Count-Min Sketch implementation as our example - a probabilistic data structure perfect for streaming data analysis. But the techniques apply to any C++ library you want to expose to Python.
Why Build Python Packages with C++?
- Multithreading Performance: C++ atomic operations bypass Python’s GIL limitations, enabling true parallel processing
- Lower-Level Control: Direct memory management and hardware-level optimizations
- Existing Libraries: Leverage existing C++ libraries in Python projects
- System Integration: Access low-level system APIs and hardware features
- Memory Efficiency: Better control over memory usage and data structures
What You’ll Learn
- Complete project structure for Python packages with C++ extensions
- How to configure
pyproject.toml
for modern Python packaging - CMake setup for cross-platform C++ builds
- pybind11 integration for seamless Python bindings
- Development workflow and testing strategies
- CI/CD pipeline for automated building and publishing
Project Structure
Understanding the project structure is crucial for Python packages with C++ extensions. Here’s the complete layout:
count-min-sketch/├── include/cmsketch/ # C++ header files│ ├── cmsketch.h # Main header (include this)│ ├── count_min_sketch.h # Core template class│ └── hash_util.h # Hash utility functions├── src/cmsketchcpp/ # C++ source files│ └── count_min_sketch.cc # Core implementation├── src/cmsketch/ # Python package source│ ├── __init__.py # Package initialization│ ├── base.py # Base classes and interfaces│ ├── _core.pyi # Type stubs for C++ bindings│ ├── _version.py # Version information│ ├── py.typed # Type checking marker│ └── py/ # Pure Python implementations│ ├── count_min_sketch.py # Python Count-Min Sketch implementation│ └── hash_util.py # Python hash utilities├── src/ # Additional source files│ ├── main.cc # Example C++ application│ └── python_bindings.cc # Python bindings (pybind11)├── tests/ # C++ unit tests│ ├── CMakeLists.txt # Test configuration│ ├── test_count_min_sketch.cc # Core functionality tests│ ├── test_hash_functions.cc # Hash function tests│ └── test_sketch_config.cc # Configuration tests├── pytests/ # Python tests│ ├── __init__.py # Test package init│ ├── conftest.py # Pytest configuration│ ├── test_count_min_sketch.py # Core Python tests│ ├── test_hash_util.py # Hash utility tests│ ├── test_mixins.py # Mixin class tests│ └── test_py_count_min_sketch.py # Pure Python implementation tests├── benchmarks/ # Performance benchmarks│ ├── __init__.py # Benchmark package init│ ├── generate_data.py # Data generation utilities│ └── test_benchmarks.py # Benchmark validation tests├── examples/ # Example scripts│ └── example.py # Python usage example├── scripts/ # Build and deployment scripts│ ├── build.sh # Production build script│ └── build-dev.sh # Development build script├── data/ # Sample data files│ ├── ips.txt # IP address sample data│ └── unique-ips.txt # Unique IP sample data├── build/ # Build artifacts (generated)│ ├── _core.cpython-*.so # Compiled Python extensions│ ├── cmsketch_example # Compiled C++ example│ ├── libcmsketch.a # Static library│ └── tests/ # Compiled test binaries├── dist/ # Distribution packages (generated)│ └── cmsketch-*.whl # Python wheel packages├── CMakeLists.txt # Main CMake configuration├── pyproject.toml # Python package configuration├── uv.lock # uv lock file├── Makefile # Convenience make targets├── LICENSE # MIT License└── README.md # This file
Key Directory Purposes
include/
: C++ header files that define the public APIsrc/cmsketchcpp/
: C++ implementation filessrc/cmsketch/
: Python package source codesrc/
: Additional C++ files like bindings and examplestests/
: C++ unit tests using Google Testpytests/
: Python tests using pytestbenchmarks/
: Performance testing and comparisonbuild/
: Generated build artifacts (not in version control)dist/
: Generated distribution packages (not in version control)
Version Management with bump-my-version
Managing versions across multiple files (Python package, C++ library, documentation) can be challenging. This project uses bump-my-version to automate version updates across all relevant files.
Configuration
The version management is configured in .bumpversion.toml
:
[bumpversion]current_version = "0.1.10"commit = truetag = truetag_name = "v{new_version}"message = "Bump version: {current_version} → {new_version}"
[bumpversion:file:pyproject.toml]search = 'version = "{current_version}"'replace = 'version = "{new_version}"'
[bumpversion:file:CMakeLists.txt]search = 'VERSION {current_version} # Project version'replace = 'VERSION {new_version} # Project version'
[bumpversion:file:VERSION]search = '{current_version}'replace = '{new_version}'
The CMakeLists.txt Trick
To make bump-my-version work with CMakeLists.txt, I use a clever trick by adding a comment:
project( cmsketch VERSION 0.1.10 # Project version LANGUAGES CXX)
The comment # Project version
helps bump-my-version
identify the correct version line in CMakeLists.txt. This ensures that other occurrences of strings like VERSION x.x.x
elsewhere in the file are not mistaken for the actual project version.
Usage
# Install bump-my-versionuv add --dev bump-my-version
# Bump patch version (0.1.10 → 0.1.11)uv run bump-my-version patch
# Bump minor version (0.1.10 → 0.2.0)uv run bump-my-version minor
# Bump major version (0.1.10 → 1.0.0)uv run bump-my-version major
# Preview changes without committinguv run bump-my-version --dry-run patch
What Gets Updated
When you run bump-my-version
, it automatically updates:
pyproject.toml
: Python package versionCMakeLists.txt
: C++ project versionVERSION
: Standalone version file- Git commit: Creates a commit with the version bump
- Git tag: Creates a tag like
v0.1.11
This ensures all version references stay synchronized across your entire project.
pyproject.toml Configuration
The pyproject.toml
file is the heart of modern Python packaging. Here’s how to configure it for C++ extensions:
[build-system]requires = ["scikit-build-core>=0.10", "pybind11", "cmake>=3.15"]build-backend = "scikit_build_core.build"
[project]name = "cmsketch"version = "0.1.10"description = "High-performance Count-Min Sketch implementation with C++ and Python versions"readme = "README.md"license = { file = "LICENSE" }authors = [{ name = "isaac-fei", email = "isaac.omega.fei@gmail.com" }]maintainers = [{ name = "isaac-fei", email = "isaac.omega.fei@gmail.com" }]requires-python = ">=3.11"classifiers = [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.11", "Programming Language :: Python :: 3.12", "Programming Language :: C++", "Topic :: Scientific/Engineering", "Topic :: Software Development :: Libraries :: Python Modules", "Operating System :: OS Independent",]keywords = ["count-min-sketch", "probabilistic", "data-structure", "streaming"]
[project.urls]Homepage = "https://github.com/isaac-fate/count-min-sketch"Repository = "https://github.com/isaac-fate/count-min-sketch"Documentation = "https://github.com/isaac-fate/count-min-sketch#readme"Issues = "https://github.com/isaac-fate/count-min-sketch/issues"
[project.optional-dependencies]dev = ["pytest>=8.0.0", "pytest-benchmark>=4.0.0", "build>=1.0.0"]
[tool.scikit-build]build-dir = "build/{wheel_tag}"wheel.exclude = ["lib/**", "include/**"]
[tool.scikit-build.cmake]args = [ "-DCMAKE_BUILD_TYPE=Release", "-DCMAKE_CXX_STANDARD=17", "-DCMAKE_CXX_STANDARD_REQUIRED=ON", "-DCMAKE_CXX_EXTENSIONS=OFF",]
[tool.cibuildwheel]build = "cp311-* cp312-*"skip = "*-win32 *-manylinux_i686 *-musllinux*"test-command = "python -m pytest {project}/pytests -v"test-requires = "pytest"manylinux-x86_64-image = "manylinux_2_28"
[tool.cibuildwheel.macos]environment = { MACOSX_DEPLOYMENT_TARGET = "10.15" }
[tool.cibuildwheel.windows]before-build = "pip install delvewheel"repair-wheel-command = "delvewheel repair -w {dest_dir} {wheel}"
[tool.pytest.ini_options]testpaths = ["pytests"]python_files = ["test_*.py"]python_classes = ["Test*"]python_functions = ["test_*"]addopts = ["-v", "--tb=short"]
Key Configuration Sections
[build-system]
: Specifies the build backend and requirements
scikit-build-core
: Modern build system for C++ extensionspybind11
: C++ to Python binding librarycmake
: C++ build system
[project]
: Package metadata and dependencies
- Standard Python package information
requires-python
: Minimum Python versionclassifiers
: PyPI categorization
[tool.scikit-build]
: Build configuration
build-dir
: Where to place build artifactswheel.exclude
: Files to exclude from wheels
[tool.scikit-build.cmake]
: CMake arguments
- C++ standard and build type settings
- Cross-platform compilation flags
[tool.cibuildwheel]
: CI/CD wheel building
- Python versions and platforms to build for
- Platform-specific configurations
CMakeLists.txt Configuration
The CMakeLists.txt file orchestrates the C++ build process and Python binding generation:
cmake_minimum_required(VERSION 3.15)
project( cmsketch VERSION 0.1.10 # Project version LANGUAGES CXX)
# Generate compile_commands.json for IDE supportset(CMAKE_EXPORT_COMPILE_COMMANDS ON)
# Build optionsoption(DEVELOPMENT_MODE "Enable development mode with IDE support" OFF)option(BUILD_PYTHON_BINDINGS "Build Python bindings for development" OFF)
# C++ standard - use C++17 for better compatibilityset(CMAKE_CXX_STANDARD 17)set(CMAKE_CXX_STANDARD_REQUIRED ON)set(CMAKE_CXX_EXTENSIONS OFF)
# Default build typeif(NOT CMAKE_BUILD_TYPE) set(CMAKE_BUILD_TYPE Release CACHE STRING "Build type" FORCE)endif()
# Compiler warningsif(MSVC) add_compile_options(/W4) # Enable Windows symbol export set(CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS ON)else() add_compile_options(-Wall -Wextra -Wpedantic) # Enable position independent code for shared libraries set(CMAKE_POSITION_INDEPENDENT_CODE ON)endif()
# Platform-specific settingsif(APPLE) set(CMAKE_OSX_DEPLOYMENT_TARGET "10.9" CACHE STRING "Minimum OS X deployment version") set(CMAKE_OSX_ARCHITECTURES "x86_64;arm64" CACHE STRING "Build architectures for OS X")endif()
# Source filesfile(GLOB_RECURSE CMSKETCH_SOURCES "src/cmsketchcpp/*.cc")
# Create libraryadd_library(cmsketch ${CMSKETCH_SOURCES})target_include_directories(cmsketch PUBLIC include)target_compile_features(cmsketch PUBLIC cxx_std_17)
# Example executablefile(GLOB EXAMPLE_SOURCES "src/main.cc")add_executable(cmsketch_example ${EXAMPLE_SOURCES})target_link_libraries(cmsketch_example PRIVATE cmsketch)
# Install targetsinstall(TARGETS cmsketch DESTINATION lib)install(DIRECTORY include/ DESTINATION include)
# Python bindingsif(SKBUILD_PROJECT_NAME OR BUILD_PYTHON_BINDINGS OR DEVELOPMENT_MODE) set(PYBIND11_FINDPYTHON ON) find_package(pybind11 REQUIRED) pybind11_add_module(_core MODULE src/python_bindings.cc) target_link_libraries(_core PRIVATE cmsketch) if(SKBUILD_PROJECT_NAME) install(TARGETS _core DESTINATION ${SKBUILD_PROJECT_NAME}) endif()endif()
# Testingoption(BUILD_TESTS "Build tests" OFF)if(BUILD_TESTS OR DEVELOPMENT_MODE) find_package(GTest REQUIRED) enable_testing() add_subdirectory(tests)endif()
Key CMake Sections
Project Setup: Basic project configuration and C++ standard Compiler Settings: Platform-specific compiler flags and warnings Library Creation: Building the core C++ library Python Bindings: pybind11 integration for Python extensions Testing: Google Test integration for C++ unit tests Installation: Target installation for packaging
Python Bindings with pybind11
The Python bindings are created in src/python_bindings.cc
:
#include "cmsketch/cmsketch.h"#include <pybind11/pybind11.h>#include <pybind11/stl.h>
namespace py = pybind11;
// Macro to define common CountMinSketch methods for a given type#define DEFINE_COUNT_MIN_SKETCH_METHODS(class_type, class_name) \ py::class_<cmsketch::CountMinSketch<class_type>>(m, class_name) \ .def(py::init<uint32_t, uint32_t>(), py::arg("width"), py::arg("depth"), \ "Create a Count-Min Sketch with specified dimensions") \ .def("insert", &cmsketch::CountMinSketch<class_type>::Insert, \ py::arg("item"), "Insert an item into the sketch") \ .def("count", &cmsketch::CountMinSketch<class_type>::Count, \ py::arg("item"), "Get the estimated count of an item") \ .def("clear", &cmsketch::CountMinSketch<class_type>::Clear, \ "Reset the sketch to initial state") \ .def("merge", &cmsketch::CountMinSketch<class_type>::Merge, \ py::arg("other"), "Merge another sketch into this one") \ .def("top_k", &cmsketch::CountMinSketch<class_type>::TopK, py::arg("k"), \ py::arg("candidates"), "Get the top k items from candidates") \ .def("get_width", &cmsketch::CountMinSketch<class_type>::GetWidth, \ "Get the width of the sketch") \ .def("get_depth", &cmsketch::CountMinSketch<class_type>::GetDepth, \ "Get the depth of the sketch")
PYBIND11_MODULE(_core, m) { m.doc() = "Count-Min Sketch implementation with Python bindings";
// CountMinSketch class for strings DEFINE_COUNT_MIN_SKETCH_METHODS(std::string, "CountMinSketchStr");
// CountMinSketch class for int DEFINE_COUNT_MIN_SKETCH_METHODS(int, "CountMinSketchInt");}
Key pybind11 Features
- Automatic Type Conversion: STL containers are automatically converted
- Method Binding: C++ methods become Python methods
- Documentation: Docstrings are automatically generated
- Template Specialization: Different types get separate Python classes
C++ Atomic Implementation
The core advantage of this C++ implementation is its use of atomic operations for thread safety, which bypasses Python’s Global Interpreter Lock (GIL). Here’s how the atomic implementation works:
Header file (include/cmsketch/count_min_sketch.h
):
template<typename KeyType>class CountMinSketch {private: // 2D array of atomic counters for thread-safe operations std::vector<std::vector<std::atomic<size_t>>> counters_; std::vector<std::function<size_t(const KeyType&)>> hash_functions_; size_t width_; size_t depth_;
public: void Insert(const KeyType& key); size_t Count(const KeyType& key) const; // ... other method declarations};
Implementation file (src/cmsketchcpp/count_min_sketch.cc
):
template<typename KeyType>void CountMinSketch<KeyType>::Insert(const KeyType& key) { for (size_t i = 0; i < depth_; ++i) { size_t hash_value = hash_functions_[i](key); size_t index = hash_value % width_; // Atomic increment - thread-safe without locks counters_[i][index].fetch_add(1, std::memory_order_relaxed); }}
template<typename KeyType>size_t CountMinSketch<KeyType>::Count(const KeyType& key) const { size_t min_count = std::numeric_limits<size_t>::max(); for (size_t i = 0; i < depth_; ++i) { size_t hash_value = hash_functions_[i](key); size_t index = hash_value % width_; // Atomic read - thread-safe without locks size_t count = counters_[i][index].load(std::memory_order_relaxed); min_count = std::min(min_count, count); } return min_count;}
Key Atomic Features
Memory Ordering: Using std::memory_order_relaxed
for optimal performance
- No synchronization overhead: Relaxed ordering is sufficient for counters
- Hardware optimization: Allows CPU to reorder operations for better performance
- Cache efficiency: Reduces memory barrier overhead
Thread Safety Benefits:
- Lock-free design: No mutexes or locks required
- Concurrent access: Multiple threads can insert/query simultaneously
- GIL bypass: C++ threads operate independently of Python’s GIL
- Scalability: Performance scales with number of CPU cores
Performance Comparison:
# Python implementation (GIL limited)def insert_python(self, key): with self.lock: # Serialized access # ... increment counters
// C++ implementation (atomic operations)void CountMinSketch<KeyType>::Insert(const KeyType& key) { // Parallel access - no locks needed counters_[i][index].fetch_add(1, std::memory_order_relaxed);}
This atomic implementation enables true parallel processing where multiple threads can simultaneously insert and query the sketch without blocking each other, providing significant performance advantages in multithreaded environments.
Development Workflow
Here’s the complete development workflow for building Python packages with C++ extensions:
1. Initial Setup
# Create project directorymkdir my-python-cpp-packagecd my-python-cpp-package
# Initialize git repositorygit init
# Create basic directory structuremkdir -p include/mypackage src/mypackagecpp src/mypackage/py tests pytests examples
2. Development Environment
# Install development dependenciesuv sync --dev
# Build in development modeuv run python -m pip install -e .
# Run testsuv run pytest pytests/make build-dev && cd build && make test
3. File Associations
Understanding how files relate to the project structure:
C++ Headers (include/cmsketch/
):
cmsketch.h
→ Main header included by userscount_min_sketch.h
→ Core template class definitionhash_util.h
→ Utility functions
C++ Implementation (src/cmsketchcpp/
):
count_min_sketch.cc
→ Template class implementation- Links to headers via
#include "cmsketch/cmsketch.h"
Python Package (src/cmsketch/
):
__init__.py
→ Package initialization and public API_core.pyi
→ Type stubs for C++ bindingsbase.py
→ Abstract base classespy/
→ Pure Python implementations
Python Bindings (src/python_bindings.cc
):
- Links C++ library to Python via pybind11
- Creates
_core
module withCountMinSketchStr
andCountMinSketchInt
classes
Build Configuration:
pyproject.toml
→ Python package metadata and build settingsCMakeLists.txt
→ C++ build configuration and pybind11 integration
4. Build Process
The build process follows this sequence:
- CMake Configuration: Reads
CMakeLists.txt
and configures build - C++ Compilation: Compiles C++ source files into library
- pybind11 Binding: Generates Python extension module
- Python Packaging: Creates wheel with both C++ library and Python bindings
5. Testing Strategy
C++ Tests (tests/
):
#include <gtest/gtest.h>#include "cmsketch/cmsketch.h"
TEST(CountMinSketchTest, BasicFunctionality) { cmsketch::CountMinSketch<std::string> sketch(100, 3); sketch.Insert("test"); EXPECT_EQ(sketch.Count("test"), 1);}
Python Tests (pytests/
):
import pytestimport cmsketch
def test_basic_functionality(): sketch = cmsketch.CountMinSketchStr(100, 3) sketch.insert("test") assert sketch.count("test") == 1
6. CI/CD Pipeline
The project uses GitHub Actions for automated building and testing:
Test Workflow (.github/workflows/test.yml
):
- Runs on push/PR
- Tests C++ and Python code
- Cross-platform testing (Windows, Linux, macOS)
Wheel Building (.github/workflows/wheels.yml
):
- Uses cibuildwheel for cross-platform wheel generation
- Builds for multiple Python versions and architectures
- Tests wheels before publishing
Release Workflow (.github/workflows/release.yml
):
- Triggers on git tags
- Publishes wheels to PyPI
- Creates GitHub releases
C++ Implementation
The core implementation uses a template-based design that supports any hashable key type:
#include "cmsketch/cmsketch.h"#include <iostream>
int main() { // Create a sketch with width=1000, depth=5 cmsketch::CountMinSketch<std::string> sketch(1000, 5);
// Add elements sketch.Insert("apple"); sketch.Insert("apple"); sketch.Insert("banana");
// Query frequencies std::cout << "apple: " << sketch.Count("apple") << std::endl; // 2 std::cout << "banana: " << sketch.Count("banana") << std::endl; // 1 std::cout << "cherry: " << sketch.Count("cherry") << std::endl; // 0
return 0;}
The implementation uses multiple hash functions to distribute items across the counter array, providing probabilistic guarantees on estimation accuracy.
Template Design
The template-based approach allows for type-safe implementations:
template<typename KeyType>class CountMinSketch {public: CountMinSketch(size_t width, size_t depth); void Insert(const KeyType& key); size_t Count(const KeyType& key) const; std::vector<std::pair<KeyType, size_t>> TopK(size_t k, const std::vector<KeyType>& candidates) const; void Merge(const CountMinSketch& other); void Clear();
private: std::vector<std::vector<std::atomic<size_t>>> counters_; std::vector<std::function<size_t(const KeyType&)>> hash_functions_; size_t width_; size_t depth_;};
This design ensures type safety while maintaining high performance through template specialization.
Python Usage
The Python interface provides a clean, easy-to-use API:
import cmsketch
# Create a sketch for stringssketch = cmsketch.CountMinSketchStr(1000, 5)
# Add elementssketch.insert("apple")sketch.insert("apple")sketch.insert("banana")
# Query frequenciesprint(f"apple: {sketch.count('apple')}") # 2print(f"banana: {sketch.count('banana')}") # 1print(f"cherry: {sketch.count('cherry')}") # 0
# Get top-k itemscandidates = ["apple", "banana", "cherry"]top_k = sketch.top_k(2, candidates)for item, count in top_k: print(f"{item}: {count}")
Type Support
The library provides specialized classes for different data types:
CountMinSketchStr
: String-based sketchCountMinSketchInt
: Integer-based sketch
This approach optimizes performance for common use cases while maintaining the flexibility of the underlying C++ implementation.
Performance Benchmarks
The C++ implementation provides significant performance improvements over Python, especially in multithreaded environments. Here are the actual benchmark results from our test suite:
Benchmark Setup
The benchmark suite tests real-world scenarios with:
- 100,000 IP address samples generated using Faker with weighted distribution
- Realistic frequency patterns (most frequent IP appears ~10% of the time)
- Threaded processing with 10 concurrent workers and 1,000-item batches
- Comprehensive testing across insert, count, top-k, and streaming operations
Actual Benchmark Results
Insert Performance (100k items, threaded):
- C++: 45.79ms (21.84 ops/sec)
- Python: 8,751.15ms (0.11 ops/sec)
- Speedup: 191x faster with C++
Count Performance (querying unique items):
- C++: 4.71μs per query (212,130 ops/sec)
- Python: 858.58μs per query (1,165 ops/sec)
- Speedup: 182x faster with C++
Top-K Performance (finding top items):
- C++: 2.57μs per operation (389,163 ops/sec)
- Python: 857.54μs per operation (1,166 ops/sec)
- Speedup: 334x faster with C++
Streaming Performance (insert + top-k):
- C++: 46.03ms (21.72 ops/sec)
- Python: 8,889.81ms (0.11 ops/sec)
- Speedup: 193x faster with C++
Performance Analysis
Operation | C++ Time | Python Time | Speedup | Key Advantage |
---|---|---|---|---|
Insert (100k threaded) | 45.79ms | 8,751.15ms | 191x | GIL bypass + atomic operations |
Count (per query) | 4.71μs | 858.58μs | 182x | Direct memory access |
Top-K (per operation) | 2.57μs | 857.54μs | 334x | Optimized algorithms |
Streaming (end-to-end) | 46.03ms | 8,889.81ms | 193x | Combined benefits |
Running Benchmarks
# Run all benchmarks with pytestuv run pytest ./benchmarks
# Run specific benchmark categoriesuv run pytest ./benchmarks -k "insert"uv run pytest ./benchmarks -k "count"uv run pytest ./benchmarks -k "topk"
# Run with verbose outputuv run pytest ./benchmarks -v
# Generate test data (if needed)uv run python ./benchmarks/generate_data.py
The benchmark suite uses pytest-benchmark for reliable measurements and includes both synthetic and real-world data patterns.
Why C++ is So Much Faster
1. GIL Bypass in Multithreaded Operations
- Python: GIL serializes all operations, even with threading
- C++: Atomic operations allow true parallel processing
- Result: 191x speedup in threaded insertions
2. Memory Access Patterns
- Python: Object overhead, dynamic typing, garbage collection
- C++: Direct memory access, contiguous arrays, no GC overhead
- Result: 182x speedup in count operations
3. Algorithm Optimization
- Python: Interpreted bytecode, dynamic dispatch
- C++: Compiled machine code, template specialization
- Result: 334x speedup in top-k operations
4. Thread Safety Implementation
# Python: Lock-based (serialized)def insert_python(self, key): with self.lock: # All threads wait here # ... increment counters
// C++: Atomic operations (parallel)void CountMinSketch<KeyType>::Insert(const KeyType& key) { // All threads can execute simultaneously counters_[i][index].fetch_add(1, std::memory_order_relaxed);}
5. Memory Efficiency
- Python: ~8 bytes per integer + object overhead
- C++: 4 bytes per atomic counter
- Result: 2-3x less memory usage
Project Architecture
The project demonstrates modern software engineering practices:
Build System
- CMake: Cross-platform C++ build configuration
- scikit-build-core: Modern Python build system for C++ extensions
- pybind11: Seamless C++ to Python binding generation
- uv: Fast, modern Python package management
Project Structure
count-min-sketch/├── include/cmsketch/ # C++ header files│ ├── cmsketch.h # Main header│ ├── count_min_sketch.h # Core template class│ └── hash_util.h # Hash utilities├── src/cmsketchcpp/ # C++ source files│ └── count_min_sketch.cc # Core implementation├── src/cmsketch/ # Python package│ ├── __init__.py # Package initialization│ ├── _core.pyi # Type stubs│ └── py/ # Pure Python implementations├── tests/ # C++ unit tests├── pytests/ # Python tests├── benchmarks/ # Performance benchmarks└── examples/ # Example scripts
CI/CD Pipeline
The project uses GitHub Actions for automated testing and publishing:
- Cross-Platform Testing: Windows, Linux, macOS
- Wheel Building: Automated wheel generation for all platforms
- PyPI Publishing: Automatic package distribution on release
Educational Value
This project demonstrates several important software engineering concepts:
1. Python Package Development with C++ Extensions
- pybind11 Integration: Seamless C++ to Python binding generation
- Type Stubs: Complete type information for Python IDEs
- Modern Build Tools: scikit-build-core and uv for package management
2. Performance Engineering
- C++ vs Python: Direct performance comparison between implementations
- Memory Efficiency: Optimized data structures and memory usage patterns
- Thread Safety: Atomic operations and concurrent access patterns
3. Build System Integration
- CMake: Cross-platform C++ build configuration
- Python Packaging: Complete pip-installable package creation
- CI/CD: Automated testing and publishing workflows
4. Modern C++ Practices
- Template Metaprogramming: Generic, type-safe implementations
- RAII: Resource management and exception safety
- STL Integration: Standard library containers and algorithms
Getting Started
Installation
# Using pippip install cmsketch
# Using uv (recommended)uv add cmsketch
Basic Usage
import cmsketch
# Create a sketchsketch = cmsketch.CountMinSketchStr(1000, 5)
# Add elementssketch.insert("apple")sketch.insert("apple")sketch.insert("banana")
# Query frequenciesprint(f"apple: {sketch.count('apple')}") # 2print(f"banana: {sketch.count('banana')}") # 1
Building from Source
# Clone the repositorygit clone https://github.com/isaac-fate/count-min-sketch.gitcd count-min-sketch
# Build everythingmake build
# Run testsmake test
# Run examplemake example
Key Takeaways
Building Python packages with C++ extensions requires understanding several interconnected systems:
1. Project Structure
- Clear separation between C++ headers, implementation, and Python bindings
- Logical organization that scales from simple to complex projects
- Build artifact management to keep source control clean
2. Build System Integration
- pyproject.toml for modern Python packaging standards
- CMakeLists.txt for cross-platform C++ compilation
- pybind11 for seamless C++ to Python binding generation
3. Development Workflow
- Incremental development with hot reloading during development
- Comprehensive testing at both C++ and Python levels
- CI/CD automation for cross-platform wheel building and publishing
4. Performance Benefits
- 191x speedup in threaded insertions (GIL bypass)
- 182x speedup in count operations (direct memory access)
- 334x speedup in top-k operations (compiled optimization)
- Atomic operations enable true parallel processing without locks
- Memory efficiency through direct C++ data structure control
Next Steps
To apply these techniques to your own projects:
- Start Simple: Begin with a basic C++ function and Python binding
- Iterate Gradually: Add complexity incrementally (templates, STL containers, etc.)
- Test Thoroughly: Implement both C++ and Python test suites
- Automate Everything: Set up CI/CD for automated building and testing
- Document Well: Provide clear examples and API documentation
The complete source code, documentation, and benchmarks are available on GitHub, and the package is available on PyPI for immediate use.
This approach to Python package development with C++ extensions provides a solid foundation for building high-performance libraries that combine the best of both worlds: Python’s ease of use and C++‘s performance.
Comments 💬