Both ARM and x86 can convert fp16 much faster in bulk than one at a
time. This also enables hardware accelerated conversion on x86, when
F16C isn't unconditionally available at compile time.
This code is implemented in C to ensure that there's no leakage of
inline symbols from the .obj file that was compiled by Visual Studio
with AVX support. Unfortunately, simd.prf uses $(CXX) instead of $(CC)
for all its sources, which means the file gets interpreted as C++ by
g++, clang++ and icpc. Those compilers at least don't leak any symbols.
Done-with: Thiago Macieira <thiago.macieira@intel.com>
Change-Id: I9d26d99e83392861fb09564e0e8e8d76cd8483b3
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>