Tensor Manipulation Unit (TMU): Reconfigurable, Near-Memory, High-Throughput AI

While recent advances in AI SoC design have focused heavily on accelerating tensor computation, the equally critical task of tensor manipulation, centered on high,volume data movement with minimal computation, remains underexplored. This work addresses that g…