It varies, depending upon structure and hardware you are trying to target. A vague definition would be “enough that it is worth exploiting”. In cuSPARSE and MKL there are block sparse formats, where some zeros may be stored, but this way there are nice “dense” regions, where you can do dense computation. When matrices are symmetric, A = A^T, or the algorithms involve calculation of matrix powers y = A^p x (matrix polynomials, s-step Krylov methods), there are more tricks which can be exploited.
I’ve seen some works from the libCEED and MFEM communities, where the matrix-free approaches beat assembled matrices by large margins. See for example Fig. 7 in this work: [2204.01722] Performance Portable Solid Mechanics via Matrix-Free $p$-Multigrid, or other works by Jed Brown. Another person working heavily in this area, specifically on matrix-free discontinous Galerkin methods, is Martin Kronbichler.
AFAIK, the advantage of these algorithms is because they have higher amounts of flops per dof, they are better suited to today’s “imbalanced” architectures, where you can do more flops than memory accesses.
(Image taken from the NHR PerfLab Seminar: A Review of Processor Advances Over the Past Thirty Years)