Mini-Infer (9): 打造高性能算子的基石 — RAII `Buffer` 与 `noexcept` 极致优化

1. 技术深潜：`noexcept` 的奥义

在构建高性能 C++ 库时，我们经常看到 noexcept 这个关键字。它不仅仅是一个装饰，它是性能优化的关键开关。

noexcept 是 C++11 引入的一个关键字，它的作用非常明确：告诉编译器（和读代码的人），这个函数保证不会抛出任何异常。

1.1 核心作用：向编译器“承诺”不抛异常

当你把函数声明为 noexcept 时：

1
2
3

void myFunc() noexcept {
    // ...
}

你是在立下一个“军令状”：“我保证这里面代码无论发生什么，都不会让异常飞出这个函数体。”

如果违背了誓言会怎样？ 如果一个被标记为 noexcept 的函数真的抛出了异常，C++ 运行时不会尝试去捕获它，也不会进行“栈展开”（Stack Unwinding，即不会去析构局部对象）。程序会立即调用 std::terminate()，直接粗暴地崩溃（Crash）。

这意味着：noexcept 里的异常是无法被外部的 try-catch 捕获的。

1.2 为什么需要它？（为了性能！）

你可能会问：“如果它只会导致程序崩溃，我为什么要用它？” 答案是：性能优化，特别是对于标准库容器（如 std::vector）的优化。

场景：std::vector 扩容 当 std::vector 空间不足需要扩容时，它会申请一块更大的新内存，把旧数据迁移过去。

如果你的移动构造函数（Move Constructor）加了 noexcept： Vector 会放心地使用**移动（Move）**操作。把旧对象“搬”过去，成本极低（只是指针拷贝）。
如果你的移动构造函数没有加 noexcept： Vector 为了保证数据安全性（Strong Exception Guarantee），它不敢用移动。因为它怕移到一半抛异常了，旧数据已经被破坏（移走了），新数据也没建好，导致数据丢失。所以，Vector 会退化成使用**拷贝（Copy）**操作。这对于大对象（如大的 Tensor）来说，性能是一个巨大的损失。

性能对比示例：

class Tensor {
public:
    // ✅ 加了 noexcept：vector 扩容时会由 "Copy" 变成 "Move"
    // 速度快 10 倍不止
    Tensor(Tensor&& other) noexcept { 
        // ... 移动指针资源 ...
    }
    
    // ❌ 没加 noexcept：vector 扩容时会触发深拷贝 (Deep Copy)
    // Tensor(Tensor&& other) { ... }
};

1.3 什么时候应该用 `noexcept`？

通常建议在以下 4 种情况必须加：

移动构造函数 (Move Constructor)：为了 std::vector 优化。
移动赋值运算符 (Move Assignment Operator)：同上。
析构函数 (Destructor)：默认为 noexcept，除非你故意修改。
叶子函数 (Leaf Functions)：明显不会抛异常的短小函数（如 get_size()），帮助编译器优化。

2. RAII 内存利器：`Buffer<T>`

在 Mini-Infer 中，Tensor 类用于在图的节点之间传递数据。它是基于 std::shared_ptr 的，带有引用计数，比较重。

而在算子内部（比如 Convolution 的 im2col 操作），我们需要一个临时的、轻量级的、独占的内存块。

为此，我们引入 Buffer<T>。

`buffer.h` 实现

#pragma once

#include "mini_infer/core/allocator.h"
#include <cstddef>
#include <cstring>

namespace mini_infer {
namespace core {

/**
 * @brief RAII wrapper for allocator-managed memory
 * * Provides a safe way to manage temporary buffers using the allocator system.
 * Similar to std::vector but uses CPUAllocator for unified memory management.
 * * Usage:
 * Buffer<float> buf(size);
 * float* data = buf.data();
 */
template<typename T>
class Buffer {
public:
    /**
     * @brief Construct a buffer with the given size
     * @param size Number of elements (not bytes)
     * @param allocator The allocator to use (defaults to CPUAllocator)
     */
    explicit Buffer(size_t size, Allocator* allocator = nullptr)
        : size_(size)
        , allocator_(allocator ? allocator : CPUAllocator::get_instance())
        , data_(nullptr) {
        
        if (size_ > 0) {
            size_t bytes = size_ * sizeof(T);
            // 使用我们统一的 Allocator 分配，方便追踪内存使用
            data_ = static_cast<T*>(allocator_->allocate(bytes));
            
            // Initialize to zero
            if (data_) {
                std::memset(data_, 0, bytes);
            }
        }
    }
    
    /**
     * @brief Destructor - deallocates the buffer
     * RAII 核心：离开作用域自动释放
     */
    ~Buffer() {
        if (data_) {
            allocator_->deallocate(data_);
            data_ = nullptr;
        }
    }
    
    // Disable copy (防止昂贵的深拷贝)
    Buffer(const Buffer&) = delete;
    Buffer& operator=(const Buffer&) = delete;
    
    // Enable move (使用 noexcept 优化)
    // 允许将 Buffer 所有权转移，例如从函数返回 Buffer
    Buffer(Buffer&& other) noexcept
        : size_(other.size_)
        , allocator_(other.allocator_)
        , data_(other.data_) {
        other.data_ = nullptr;
        other.size_ = 0;
    }
    
    Buffer& operator=(Buffer&& other) noexcept {
        if (this != &other) {
            // 1. 释放自己的旧资源
            if (data_) {
                allocator_->deallocate(data_);
            }
            
            // 2. 窃取对方的资源
            size_ = other.size_;
            allocator_ = other.allocator_;
            data_ = other.data_;
            
            // 3. 将对方置空
            other.data_ = nullptr;
            other.size_ = 0;
        }
        return *this;
    }
    
    // ... Accessors (data, size, operator[]) ...
    T* data() { return data_; }
    const T* data() const { return data_; }
    size_t size() const { return size_; }
    size_t size_in_bytes() const { return size_ * sizeof(T); }
    bool empty() const { return data_ == nullptr || size_ == 0; }
    
    T& operator[](size_t index) { return data_[index]; }
    const T& operator[](size_t index) const { return data_[index]; }
    
private:
    size_t size_;           ///< Number of elements
    Allocator* allocator_;  ///< The allocator used
    T* data_;               ///< Pointer to the data
};

} // namespace core
} // namespace mini_infer

3. 为什么 `Buffer` 比 `std::vector` 更好？

你可能会问：“为什么不直接用 std::vector<float>？”

统一的内存管理： Buffer 使用我们的 Allocator 接口。这意味着如果未来我们要统计整个模型的内存占用，或者切换到特定的内存池（Memory Pool），Buffer 会自动遵循这些规则，而 std::vector 只会傻傻地调用系统 new。
避免初始化开销（可选）： std::vector 默认会构造所有元素。虽然我们在 Buffer 中也用了 memset，但对于 POD 类型，这比 vector 的构造循环要快。在某些极致优化场景下，我们甚至可以去掉 memset，只申请不初始化。
显式的 RAII： Buffer 明确表达了这是一个“临时工作区”的语义，且禁用了拷贝，防止了隐式的性能杀手。

4. 实战预览：`Convolution` 中的应用

在下一篇我们实现 Convolution 时，你将看到 Buffer 的威力。

没有 Buffer 的代码 (危险)：

// Bad Style
void conv_forward(...) {
    float* col_data = (float*)malloc(size); // 申请
    if (!col_data) return Error;
    
    im2col(..., col_data);
    
    if (something_wrong) {
        free(col_data); // 必须记得释放！容易漏写，导致内存泄漏
        return Error;
    }
    
    gemm(...);
    free(col_data); // 必须记得释放
}

使用 Buffer 的代码 (安全且优雅)：

// Good Style
void conv_forward(...) {
    // 申请临时内存，生命周期绑定在这个函数栈帧
    core::Buffer<float> col_buffer(size); 
    
    if (col_buffer.empty()) return Error; // 检查分配是否成功
    
    // 直接使用
    im2col(..., col_buffer.data());
    gemm(..., col_buffer.data());
    
    // 函数结束（无论是否发生错误 return），
    // col_buffer 的析构函数会自动调用 deallocate。
    // 0 内存泄漏风险。
}

总结

通过引入 Buffer<T> 和正确使用 noexcept，我们不仅让代码变得更安全（RAII），还为未来的性能优化（移动语义）打下了基础。