Hi all,
Recently a challenge (The One Billion Row Challenge - Gunnar Morling) has been announced to read and process a simple csv file. The trick is the file has 1e+9 rows thus its size is ~13Gb. Originally the challenge is for Java programmers, but many people have proposed solutions in their favorite languages (see “Show & tell” link there). BTW, best time in Java is currently ~2.5 s, while best C result is ~0.3 s, they rely on bit-magic, SIMD and many similar things.
The problem itself is quite artificial (if we need speed, we just use a different format of file), but nevertheless…
Since in Fortran we have OpenMP almost out-of-the-box, I decided to give it a try. I’ve to spent some time … currently I mmap the file (use c-function), split it into chunks to loop through via OpenMP. In each chunk I find a csv-separator (‘;’ char in this case) using Findloc, and scan the following number without if.
I’ve tried many different things to speed up, but nothing helps anymore. Currently I’m at nearly ~3 s with 16 threads. Significant amount of time is spent at FINDLOC. C-wizards use SIMD instructions to parse strings, I have not fount if it is possible to do in Fortran (making a c-call is possible, but would be unjust IMHO).
Do you have any ideas how to improve further?
!$OMP PARALLEL DO stuff
do chunk_num=1,Nchunks
cpos = ch_start(chunk_num)
do
pos_s=findloc(i11(cpos:),scode,DIM=1)-1
d2p = merge(1,0,i11(cpos+pos_s+2) == dcode) ; d3p = merge(1,0,i11(cpos+pos_s+3) == dcode) ; d4p = merge(1,0,i11(cpos+pos_s+4) == dcode) ; m1p = merge(1,0,i11(cpos+pos_s+1) == mcode)
pos_nl = d2p*(pos_s+4) + d3p*(pos_s+5) + d4p*(pos_s+6) + 1
tempi = d2p*( (i11(cpos+pos_s+1)-zcode)*10 + (i11(cpos+pos_s+3)-zcode) ) + & ! dot is the 2nd symbol => temp has a form of 1.2
d3p*( (1.-m1p)*((i11(cpos+pos_s+1)-zcode)*100 + (i11(cpos+pos_s+2)-zcode)*10 + (i11(cpos+pos_s+4)-zcode)) + & ! dot is the 3rd symbol => temp has a form of 12.4 or -4.5
m1p *( - (i11(cpos+pos_s+2)-zcode)*10 - (i11(cpos+pos_s+4)-zcode)) ) + &
d4p*( -(i11(cpos+pos_s+2)-zcode)*100 - (i11(cpos+pos_s+3)-zcode)*10 - (i11(cpos+pos_s+5)-zcode) )
id = get_hash_i( i11(cpos:cpos+pos_s-1) )
if (stations_cnt(id) == 0) then
do i=cpos,cpos+pos_s-1
stations(id)(1+i-cpos:1+i-cpos) = x11(i)
end do
end if
temp_min_i(id) = min(temp_min_i(id),tempi)
temp_max_i(id) = max(temp_max_i(id),tempi)
temp_cum_i(id) = temp_cum_i(id)+(tempi)
stations_cnt(id) = stations_cnt(id)+1
cpos = cpos+pos_nl
if (cpos >= ch_end(chunk_num)) exit
end do
end do
!$OMP END PARALLEL DO
My code is here: GitHub - sshestov/1brc-fortran: 1 billion row challnge in Fortran