Sysbench is a better synthetic test in my opinion because it does a lot of varied grinding instead of trying to evaluate the entire CPU using just one very narrow and specific task. Even on the same Linux distribution there can be a lot of differences between x86 and Arm builds because OpenSSL cannot be entirely orthogonal in its use of assembly optimizations and CPU crypto acceleration, even for the RSA benchmark.
To offer a somewhat more varied example I've tested the POV-Ray benchmarker on Debian 11 on Oracle's Ampere Altra servers ("A1.Flex") versus an identical setup/build running on a 2 GHz EPYC 7281-based x86-64 VPS, and on that single-threaded test the Arm VPS handily outpaced the EPYC with almost 2x the performance.
`sysbench cpu` basically measures whether a single primitive inside sysbench is correctly optimized for your platform which is almost meaningless. Its run-to-run variation is enormous because there is a lot riding on whether particular data structure is optimally placed and aligned, which sysbench makes no effort to control. On a typical hyperthreaded x86 machine you will get 100% variance or worse depending on whether sysbench's 2 threads are placed on the same core or on different cores, so you must control that with `taskset` if you want the result to mean anything.
On my local machine with 4 threads I get ~10k events per second on cores 0+2+4+6, but on cores 8-11 I get ~13k. Does this mean that Gracemont Atom is 30% faster than Golden Cove Core? No, it is only measuring the fact that the efficiency cores happen to share an L2 cache.
To offer a somewhat more varied example I've tested the POV-Ray benchmarker on Debian 11 on Oracle's Ampere Altra servers ("A1.Flex") versus an identical setup/build running on a 2 GHz EPYC 7281-based x86-64 VPS, and on that single-threaded test the Arm VPS handily outpaced the EPYC with almost 2x the performance.