Devices of compute capability 1.3 and higher provide native support for double-precision floating-point values (that is, values 64 bits wide). Results obtained using double-precision arithmetic will frequently differ from the same operation performed via single-precision arithmetic due to the greater precision of the former and due to rounding issues. Therefore, it is important to be sure to compare like with like and to express the results within a certain tolerance rather than expecting them to be exact.
Whenever doubles are used, use at least the –arch=sm_13 switch on the nvcc command line; see Sections 3.1.3 and 3.1.4 of the CUDA C Programming Guide for more details.