Subject
aho-corasick 1.1.4 (by Andrew Gallant / BurntSushi) implements multi-pattern string search via the Aho-Corasick algorithm. It exposes three automaton backends: a noncontiguous NFA, a contiguous NFA, and a DFA; and a fourth path via vectorized SIMD prefilters (Teddy, Rabin-Karp) gated on perf-literal. The crate is #![no_std] with an optional std feature and is a direct dependency of regex and bstr.
Methodology
The published crate was diffed against the upstream Git repository at the commit recorded in .cargo_vcs_info.json using diff -rq. All source files in src/ (28,433 lines across 32 files) were read. Priority was given to all files containing unsafe (dfa.rs, automaton.rs, nfa/contiguous.rs, nfa/noncontiguous.rs, packed/teddy/generic.rs, packed/teddy/builder.rs, packed/pattern.rs, packed/ext.rs, packed/vector.rs). The VCS .github/workflows/ci.yml and the fuzz target at fuzz/fuzz-targets/fuzz_find.rs were also read. Tools used: openvet 0.6.0, diff, ripgrep.
Results
The diff between published contents and VCS shows only the expected cargo-normalised Cargo.toml divergence; all source files are byte-for-byte identical.
The crate ships no binary artifacts (justifying has-binaries) and no build.rs (justifying has-build-exec and has-install-exec). No network, filesystem, process-execution, or environment-variable access was found anywhere in the source (justifying uses-network, uses-filesystem, uses-exec, uses-environment). No cryptographic operations are present (uses-crypto, impl-crypto). No JIT compiler or interpreter is present (uses-jit, impl-jit, uses-interpreter, impl-interpreter). No protocol or parser is implemented (impl-protocol, impl-parser). No concurrency primitives are implemented or used at runtime (uses-concurrency, impl-concurrency). The Arc<dyn AcAutomaton> in ahocorasick.rs is used for type-erasure, not runtime synchronisation; the public types are Send + Sync by construction since the underlying automata contain no interior mutability.
The crate contains 227 unsafe occurrences (justifying uses-unsafe), concentrated in two areas. First, the Teddy SIMD pointer loops in packed/teddy/generic.rs: every unsafe fn carries a # Safety doc comment detailing the pointer validity pre-conditions, and the sole public entry point (Searcher::find in builder.rs) guards entry with an assert! on the minimum-length invariant before deriving raw pointers from the caller's &[u8] slice. CPU-feature gating is enforced by #[target_feature] attributes on concrete implementations and by runtime CPUID checks before construction. Second, the DFA transition-table indexing in dfa.rs: the next_state method indexes self.trans with a pre-multiplied state ID; the construction invariant guarantees the table is sized to state_len * stride and that all stored IDs are valid, so no out-of-bounds access is possible. The Patterns::get_unchecked and Teddy::verify_bucket unchecked accesses both carry debug_assert! guards and // SAFETY: comments; the bucket index is bounded by bit % BUCKETS which is always less than self.buckets.len(). Together these justify unsafe-safe, unsafe-documented, and unsafe-minimal.
The crate implements the Aho-Corasick automaton (a finite-state data structure, justifying impl-datastructure) and the Teddy and Rabin-Karp multi-pattern search algorithms (impl-algorithm). The CI matrix covers stable, beta, nightly, and pinned MSRV (1.60.0) on x86, x86-64, aarch64, powerpc64, and s390x; the VCS repository contains 78 #[test] functions in src/tests.rs and a libfuzzer target covering all automaton kinds, match semantics, and prefilter settings (justifying has-unit-tests, has-fuzz-tests, unsafe-tested, algorithm-impl-tested, datastructure-impl-tested). No property-based tests are present (has-property-tests, has-integration-tests). Search semantics (leftmost-first, leftmost-longest, standard) are exercised in the test suite, justifying algorithm-impl-correct and datastructure-impl-correct. Linear time complexity is documented and enforced by construction; the crate does not use any data structures with adversarial complexity exposures (algorithm-impl-bounds, datastructure-impl-bounds).
No findings were recorded. The crate contains no obfuscated code, suspicious network endpoints, or telemetry (is-benign).
Conclusion
The crate's unsafe surface is concentrated in two well-understood areas: SIMD pointer arithmetic for Teddy and DFA transition-table indexing. Both areas are documented with # Safety comments, guarded by construction-time invariants, and covered by a libfuzzer target that exercises all automaton backends and prefilter configurations. No I/O of any kind is present.