1brc in AWK (< 6 minutes 😁 ) #171

stig · 2024-01-06T12:28:44Z

stig
Jan 6, 2024

AWK is my go-to data munging tool, so I was curious to see how a naive solution in AWK would fare. Pretty well I would say!

Source:

time mawk -F';' '
{
    if (!c[$1]) {
        c[$1] = 1;
        min[$1] = sum[$1] = Max[$1] = $2;
    } else {
        c[$1]++;
        sum[$1] += $2;
        if ($2 < min[$1]) min[$1] = $2;
        else if ($2 > Max[$1]) Max[$1] = $2;
    }
}

END {
    for (i in c)
        printf "%s = %.1f/%.1f/%.1f\n", i, min[i], sum[i]/c[i], Max[i] | "sort"
}
' measurements.txt

Running against the full 1 billion row data set on this machine (2021 M1 Max, 32 GB RAM) took:

real	5m47.945s
user	5m43.558s
sys	0m3.584s

For comparison, the baseline Java program on my machine took:

real	3m6.326s
user	3m1.641s
sys	0m4.610s

PS: I tried awk, gawk and mawk: the latter was the clear winner. (gawk was number two.)

stig · 2024-01-06T12:43:09Z

stig
Jan 6, 2024
Author

I also wrote a slightly slower version that should be more robust against overflow for pathological input. The difference is that it calculates a running mean, rather than keeping a sum of all the measurements. Here it is:

time mawk -F';' '
{
    if (!c[$1]) {
        c[$1] = 1;
        min[$1] = mean[$1] = Max[$1] = $2;
    } else {
        c[$1]++;
        mean[$1] += ($2 - mean[$1]) / c[$1];
        if ($2 < min[$1]) min[$1] = $2;
        else if ($2 > Max[$1]) Max[$1] = $2;
    }
}

END {
    for (i in c)
        printf "%s = %.1f/%.1f/%.1f\n", i, min[i], mean[i], Max[i] | "sort"
}
' measurements.txt

This adds about a minute to the run-time:

real	6m40.863s
user	6m35.884s
sys	0m3.850s

6 replies

stig Jan 6, 2024
Author

No, I don't think so? I calculate a min and a max for each station. I use 4 different arrays, all with the same key (the station name).

gustafe Jan 6, 2024

OK, I might be using a slightly different version of mawk (I'm on Linux)

Here are the last 5 entries I get from running the code above

Yerevan = -0,0/12,1/9,0
Yinchuan = -0,0/8,7/9,0
Zagreb = -0,0/10,4/9,0
Zanzibar City = -0,0/25,5/9,0
Zürich = -0,0/9,0/9,0

Here are those entries from the output of the reference Java implementation

Yerevan=-41.8/12.4/60.8, Yinchuan=-40.1/9.0/57.7, Zagreb=-37.5/10.7/61.3, Zanzibar City=-26.8/26.0/74.0, Zürich=-46.1/9.3/63.2

stig Jan 6, 2024
Author

Oh ho ho! I'll look into that. Which of my versions did you run? Did they both give the same result?

stig Jan 6, 2024
Author

Hmm, I mentioned my version of mawk below. I cannot reproduce your results. I get the same results as the baseline, after correcting for formatting, for both my versions 🤷🏼‍♂️

 time ./calculate_average_baseline.sh| tail ; time ./calculate_average_mawk.awk measurements.txt| tail ; time ./calculate_average_mawk2.awk measurements.txt| tail   
[... snip Java output ...]
 Yangon=-25.5/27.5/78.4, Yaoundé=-22.6/23.8/77.2, Yellowknife=-54.3/-4.3/48.1, Yerevan=-41.0/12.4/62.7, Yinchuan=-40.7/9.0/61.9, Zagreb=-37.5/10.7/60.3, Zanzibar City=-25.8/26.0/82.1, Zürich=-47.1/9.3/58.8, Ürümqi=-54.0/7.4/55.1, İzmir=-33.2/17.9/67.4}
./calculate_average_baseline.sh  180.25s user 4.50s system 100% cpu 3:04.38 total
tail  0.00s user 0.00s system 0% cpu 3:04.38 total

Yangon = -25.5/27.5/78.4
Yaoundé = -22.6/23.8/77.2
Yellowknife = -54.3/-4.3/48.1
Yerevan = -41.0/12.4/62.7
Yinchuan = -40.7/9.0/61.9
Zagreb = -37.5/10.7/60.3
Zanzibar City = -25.8/26.0/82.1
Zürich = -47.1/9.3/58.8
Ürümqi = -54.0/7.4/55.1
İzmir = -33.2/17.9/67.4
./calculate_average_mawk.awk measurements.txt  347.38s user 3.79s system 99% cpu 5:52.31 total
tail  0.00s user 0.00s system 0% cpu 5:52.31 total

Yangon = -25.5/27.5/78.4
Yaoundé = -22.6/23.8/77.2
Yellowknife = -54.3/-4.3/48.1
Yerevan = -41.0/12.4/62.7
Yinchuan = -40.7/9.0/61.9
Zagreb = -37.5/10.7/60.3
Zanzibar City = -25.8/26.0/82.1
Zürich = -47.1/9.3/58.8
Ürümqi = -54.0/7.4/55.1
İzmir = -33.2/17.9/67.4
./calculate_average_mawk2.awk measurements.txt  409.55s user 3.85s system 99% cpu 6:55.83 total
tail  0.00s user 0.00s system 0% cpu 6:55.83 total

gustafe Jan 6, 2024

Thanks, I believe it's a version/install mismatch for me. When I try to get the version of mawk I get

$ mawk --version
mawk: not an option: --version

I'm on Ubuntu 22.04.3 LTS x86_64.

I still find it interesting that AWK is so much faster than Perl in this case.

przemoc · 2024-01-06T14:14:51Z

przemoc
Jan 6, 2024

I also like using AWK for data wrangling. I also wrote 1BRC solution in it 2 days ago, but:

it relied on (quite useful) GNU Awk extension: asorti,
seemed not fast enough to be worth sharing
(I didn't write if conditions in a way to minimize number of instructions, though, so some improvements were definitely possible).

On the other hand making people aware of AWK more is always a good thing, so thank you for doing it!

It seems your solution is not strictly conforming to format requested in original blog post, seemingly to be able to leverage using external sorting utlity (sort). So it's not truly standalone AWK solution, but it's at still a POSIX-y solution, so that's good.

My solution:

#!/usr/bin/awk -f

BEGIN { FS=";"; }
{
    tot[$1] += $2;
    if (!cnt[$1] || min[$1] > $2)
        min[$1] = $2;
    if (!cnt[$1] || max[$1] < $2)
        max[$1] = $2;
    cnt[$1]++;
}
END {
    n = asorti(cnt, key);
    printf("{");
    i=1;
    printf("%s=%.1f/%.1f/%.1f", key[i], min[key[i]], tot[key[i]]/cnt[key[i]], max[key[i]]);
    for (i=2; i<=n; i++) {
        printf(", %s=%.1f/%.1f/%.1f", key[i], min[key[i]], tot[key[i]]/cnt[key[i]], max[key[i]]);
    }
    printf("}\n");
}

Tested on MSYS2 in W11.

$ time ./1brc.awk measurements.txt >output.txt

real    10m15.717s
user    7m59.296s
sys     0m2.734s

$ which awk
/usr/bin/awk
$ /usr/bin/awk --version
GNU Awk 5.3.0, API 4.0, PMA Avon 8-g1, (GNU MPFR 4.2.1, GNU MP 6.3.0)
...

$ neofetch
        ,.=:!!t3Z3z.,                  przemoc@NUC11PHKi7C002
       :tt:::tt333EE3                  ----------------------
       Et:::ztt33EEEL @Ee.,      ..,   OS: Windows 11 Pro x86_64
      ;tt:::tt333EE7 ;EEEEEEttttt33#   Host: Intel(R) Client Systems NUC11PHi7
     :Et:::zt333EEQ. $EEEEEttttt33QL   Kernel: 10.0.22621
     it::::tt333EEF @EEEEEEttttt33F    Uptime: 2 hours, 35 mins
    ;3=*^```"*4EEV :EEEEEEttttt33@.    Shell: bash 5.2.21
    ,.=::::!t=., ` @EEEEEEtttz33QF     Resolution: 3840x2160
   ;::::::::zt33)   "4EEEtttji3P*      DE: Aero
  :t::::::::tt33.:Z3z..  `` ,..g.      WM: Explorer
  i::::::::zt33F AEEEtttt::::ztF       WM Theme: Custom
 ;:::::::::t33V ;EEEttttt::::t3        Terminal: mintty
 E::::::::zt33L @EEEtttt::::z3F        Terminal Font: Cascadia Mono SemiLight
{3=*^```"*4E3) ;EEEtttt:::::tZ`        CPU: 11th Gen Intel i7-1165G7 (8) @ 2.810GHz
             ` :EEEEtttt::::z7         GPU: Caption
                 "VEzjt:;;z>*`         GPU: Intel(R) Iris(R) Xe Graphics
                                       GPU: NVIDIA GeForce RTX 2060
                                       GPU
                                       Memory: 15053MiB / 32466MiB

@stig, can you share HW and SW setup used for testing your script?

4 replies

stig Jan 6, 2024
Author

Sure. I didn't do a lot of testing, since the script is so trivial. I used a subset of the file with 10 lines, and duplicated one line and halved the temperature, to verify I got the min/max/mean logic right.

I ran it on this machine:

(CrowdStrike runs on this machine, so it's possible that affects performance. I can't turn it off to verify.)

stig Jan 6, 2024
Author

I suspect your performance is being hurt by doing the !cnt[$1] check twice, and checking both min and max. If you can replace min, then you never need to check whether you need to replace max. (Since you're doing all these comparisons a billion times it adds up.) Try something like this (untested):

    if (!cnt[$1]) {
        min[$1] = max[$1] = $2;
    } else if ($2 < min[$1]) {
       min[$1] = $2;
    } else if ($2 > max[$1]) {
       max[$1] = $2;
   }

przemoc Jan 6, 2024

It wasn't clear if you're using macOS or Asahi Linux, so thanks for sharing HW/SW details.
You're using latest mawk 1.3.4-20231126, right? I'm not versed in Apple ecosystem. Did you install it from MacPorts, homebrew or something else?

I suspect your performance is being hurt by doing the !cnt[$1] check twice, and checking both min and max. If you can replace min, then you never need to check whether you need to replace max. (Since you're doing all these comparisons a billion times it adds up.)

Yeah, I'm aware of that. That's what I meant when I wrote:

(I didn't write if conditions in a way to minimize number of instructions, though, so some improvements were definitely possible)

I don't have mawk in MSYS2, but let me try your original solution with gawk instead, and retry my script with conditions optimizatized, as you suggested (that's how I should have written it from the get go, but somehow I didn't). Heck, I may also try your original solution with mingw64-20231126.zip from the mawk's author (I'm too lazy to build it myself 🦥). But I'm almost sure it will be slower on my machine than on your M1 Max anyway and it won't manage to finish in ~6min.

stig Jan 6, 2024
Author

I'm using this mawk:

~/src/1brc $ mawk --version 
mawk 1.3.4 20230525
Copyright 2008-2022,2023, Thomas E. Dickey
Copyright 1991-1996,2014, Michael D. Brennan

random-funcs:       arc4random_stir/arc4random
regex-funcs:        internal

compiled limits:
sprintf buffer      8192
maximum-integer     9223372036854775808

It's installed via nixpkgs, specifically the 22.11 channel: https://search.nixos.org/packages?channel=23.11&show=mawk&from=0&size=50&sort=relevance&type=packages&query=mawk

stig · 2024-01-06T16:06:34Z

stig
Jan 6, 2024
Author

It seems your solution is not strictly conforming to format requested in original blog post, seemingly to be able to leverage using external sorting utlity (sort).

Yeah, I went for an idiomatic AWK solution rather than contorting myself to use only AWK. The latter would not be much closer to confirming to the original challenge, since that was to use Java. 😄

2 replies

przemoc Jan 6, 2024

Yeah, I went for an idiomatic AWK solution rather than contorting myself to use only AWK. The latter would not be much closer to confirming to the original challenge, since that was to use Java. 😄

Fair enough. :) Sorry for nitpicking.

stig Jan 6, 2024
Author

No worries :-)

I didn't follow the output format either, but followed something closer to the input format with one station on each line. Much easier to read IMO.

FWIW this one doesn't use an external sort, but has to be run with gawk rather than mawk AFAIU:

#!/nix/store/blggp459hvq6swvr8nlxblkdn7ayw8y7-gawk-5.2.2/bin/gawk -f
BEGIN {FS=";"}

{
    if (!c[$1]) {
        c[$1] = 1;
        min[$1] = sum[$1] = Max[$1] = $2;
    } else {
        c[$1]++;
        sum[$1] += $2;
        if ($2 < min[$1]) min[$1] = $2;
        else if ($2 > Max[$1]) Max[$1] = $2;
    }
}


END {
    PROCINFO["sorted_in"] = "@ind_str_asc";
    for (i in c) {
        printf "%s = %.1f/%.1f/%.1f\n", i, min[i], sum[i]/c[i], Max[i];
    }
}

(I stole your hashbang scheme.)

linux-china · 2024-01-08T05:44:35Z

linux-china
Jan 8, 2024

frawk, gawk and baseline Java program testing result on my M1 Mac Book Pro 14(Apple M1 Pro, 16G).

Source:

BEGIN {
    FS = ";"
}
{
    value = 0.0 + $2
    if ($1 in Count) {
        Count[$1] += 1;
        Total[$1] += value;
        if (value < Min[$1]) Min[$1] =value;
        else if (value > Max[$1]) Max[$1] = value;
    } else {
        Count[$1] = 1;
        Total[$1] = value;
        Min[$1] = value;
        Max[$1] = value;
    }
}

END {
    for (i in Count) {
        printf "%s = %.1f/%.1f/%.1f\n", i, Min[i], (Total[i]/Count[i]), Max[i] | "sort"
    }
}

baseline Java:

real	3m12.458s
user	3m6.810s
sys	0m5.966s

frawk:

real: 3:15.88
user: 191.11s
system: 4.15s

gawk:

real: 8:31.10
user: 503.61s
system: 6.07s

0 replies

setop · 2024-01-08T20:31:49Z

setop
Jan 8, 2024

It's either 6 minutes with 20 lines of AWK written in five minutes. Or 6 seconds in 300 lines of java written in five days. Classic "XKCD: Is It Worth the Time?".

0 replies

frapa · 2024-01-18T20:22:32Z

frapa
Jan 18, 2024

My version is ever shorter (still optimized for readability):

!min[$1] || $2 < min[$1] { min[$1] = $2 }
!max[$1] || $2 > max[$1] { max[$1] = $2 }
{ sum[$1] += $2; count[$1]++ }
END { 
	for (city in min) 
	printf("%30s %5.1f / %5.1f / %5.1f\n", city, min[city], sum[city] / count[city], max[city])
}

Sample output:

                        Sokoto -20.3 /  28.0 /  77.7
                         Lagos -23.4 /  26.8 /  78.6
                   Tamanrasset -26.4 /  21.7 /  70.7
                     Milwaukee -41.4 /   8.9 /  57.9
                        Darwin -21.1 /  27.6 /  82.1
                     Fairbanks -53.5 /  -2.3 /  49.9
                         Boise -40.9 /  11.4 /  61.6
                       Algiers -34.2 /  18.2 /  65.8
                        Toluca -37.0 /  12.4 /  63.5
                     Saskatoon -46.7 /   3.3 /  52.7
                       Entebbe -29.7 /  21.0 /  71.5
                   Yellowknife -58.7 /  -4.3 /  44.4
                      Belgrade -41.9 /  12.5 /  66.3
                  Jacksonville -32.9 /  20.3 /  68.1
              Palmerston North -36.3 /  13.2 /  62.6
                         İzmir -36.8 /  17.9 /  70.2
                       Dampier -20.1 /  26.4 /  74.3
                     N'Djamena -22.1 /  28.3 /  78.3
                        Beirut -29.0 /  20.9 /  71.0

Result on my intel i7-14700K gawk

	Command being timed: "gawk -F ; -f prog.awk measurements.txt"
	User time (seconds): 364.27
	System time (seconds): 1.73
	Percent of CPU this job got: 99%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 6:06.03
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 4256
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 347
	Voluntary context switches: 1
	Involuntary context switches: 2323
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

0 replies

vmchale · 2024-01-29T21:54:27Z

vmchale
Jan 29, 2024

Awk is somewhat at a disadvantage performance-wise because there's no builtin minimum function. X86 and Aarch64 both have a single instruction that takes the minimum of two floats.

0 replies

pr3d4t0r · 2024-02-11T19:49:30Z

pr3d4t0r
Feb 11, 2024

@stig and @gunnarmorling - Howdy. With respect, the proposed mawk solution is incomplete:

The output of the script doesn't match the 1brc requirements
Weather station results aren't ordered in awk

While it's fast, it's incorrect. Here's a proposed alternative in gawk that's only 14% slower but addresses all the challenge requirements. A POSIX awk/mawk implementation with sorting would take a lot longer to write and execute. This implementation took only a few minutes to code (and a lot of minutes testing). Thoughts?

#!/usr/local/bin/gawk -f

BEGIN {
    FS = ";";
}

{
    temp[$1] += $2;
    if (!count[$1]++)
        min[$1] = max[$1] = $2;
    else {
        if ($2 < min[$1]) min[$1] = $2;
        if ($2 > max[$1]) max[$1] = $2;
    }
}

END {
    PROCINFO["sorted_in"] = "@ind_str_asc";
    printf("{");
    for (station in temp)
        printf("%s=%3.1f/%3.1f/%3.1f,", station, min[station], temp[station]/count[station], max[station]);

    printf("\b}");
}

The END block takes a subsecond of execution time. There's no advantage in trying further optimization. For comparison, I commented it out in my gawk and mawk optimizations. Then I tested commenting out only the PROCINFO index ordering line and ran it with mawk and still generating the output. This runs 5% faster than @stig's implementation.

The main processing block is the only place where meaningful optimizations can happen.

I tried various buffering strategies (e.g. (getline line < input) ... close(input)) but the nature of awk gets in the way. nice and ionice had no effect on gawk or mawk performance.

Thoughts? Have an awesome weekend.

1 reply

stig Feb 26, 2024
Author

I didn't see the leading and trailing {} and interleaved commas (nor missing linebreaks) as an essential part of the challenge, merely an implementation detail from printing the Java datastructure directly. Since the most important requirement was to use Java, and I already broke that, I didn't see a need to retain the way the output was formatted. (Beyond the sorting.) It's much (much!) more easy to visually verify that the output is sorted properly when each station is on a separate line, and it's unlikely to affect timings significantly since the number of individual stations is so small.

rakoo · 2024-02-23T18:43:09Z

rakoo
Feb 23, 2024

This thing is embarassingly parallel, so I looked to see if I could use GNU parallel as well. I'm not really interested in raw performance but more in the delta of performance we can grab with very little change of code.

I did a good old map/reduce/rereduce:

One script is run in parallel, is fed a bunch of lines and outputs the min/max/sum/count

#!/usr/bin/awk -f

BEGIN {FS=";"}
!min[$1] || $2 < min[$1] { min[$1] = $2 }
!max[$1] || $2 > max[$1] { max[$1] = $2 }
{ sum[$1] += $2; count[$1]++ }
END {
	for (city in min)
	printf("%30s;%5.1f;%5.1f;%5d;%5.1f\n", city, min[city], sum[city], count[city], max[city])
}

(Thanks @frapa !)

One final script is a rereduce to manage lines for the same city

#!/usr/bin/awk -f

BEGIN {FS=";"}
!min[$1] || $2 < min[$1] { min[$1] = $2 }
!max[$1] || $5 > max[$1] { max[$1] = $5 }
{ sum[$1] += $2; count[$1]++ }
END {
	for (city in min)
	printf("%30s / %5.1f / %5.1f / %5.1f\n", city, min[city], sum[city]/count[city], max[city])
}

The glue:

parallel --pipe-part -j 8 --round-robin --arg-file $1 ./map.awk | ./reduce.awk

The results on my i5-1145G7 @ 2.60GHz, completely unscientifically, while doing other things at the same time with the same machine:

Initial awk:

real    11m15,859s
user    11m11,850s
sys     0m2,733s

Parallelized awk:

real    5m14,067s
user    36m34,487s
sys     0m34,307s

A x2 speed improvement is really interesting, but to me the more interesting thing is that there is fare more time spent in sys, meaning that maybe the file access are less efficient ?

1 reply

pr3d4t0r Feb 26, 2024

TIL about GNU parallel - I may refactor mine later with that + gawk. Thanks!

oguz-ismail · 2024-03-04T08:17:47Z

oguz-ismail
Mar 4, 2024

Mine is pretty much the same

BEGIN {
	FS = ";"
	CONVFMT = "%.1f"
}

{
	sum[$1] += $2
	if (++c[$1] == 1)
		min[$1] = max[$1] = $2
	else if ($2 < min[$1])
		min[$1] = $2
	else if ($2 > max[$1])
		max[$1] = $2
}

END {
	fmt = "awk 'BEGIN{printf \"{\"}{printf \"%s%s\",s,$0;s=\", \"}END{print \"}\"}'"
	for (i in c)
		print i "=" min[i] "/" sum[i]/c[i] "/" max[i] | "sort | " fmt
}

Although it prints the output in this format

{Abha=-23.0/18.0/59.2, Abidjan=-16.2/26.0/67.3, Abéché=-10.0/29.4/69.0, Accra=-10.1/26.4/66.4, Addis Ababa=-23.7/16.0/67.0, Adelaide=-27.8/17.3/58.5, ...}

0 replies

shadowrun96 · 2024-06-07T14:18:37Z

shadowrun96
Jun 7, 2024

Hi!
I'm late to the party, however, I couldn't figure out the main body of the script as it seems needlessly complicated. Forgive me if I'm wrong, I was just brushing up AWK and came across this challenge today. How about the following?

BEGIN {
        FS=";"
}

{ station_reads_count[$1] += 1; station_total_temp[$1] += $2
  if ($2 < station_reads_min[$1]) station_reads_min[$1] = $2
  if ($2 > station_reads_max[$1]) station_reads_max[$1] = $2

}

END {

        for (s in station_reads_max) {
                printf "Station:%s Min:%d Max:%d Mean:%d\n", s, station_reads_min[s], station_reads_max[s], station_total_temp[s]/station_reads_count[s]
        }
}

The original script on a quite old quad core i5-6500:
real 26m24.828s
user 26m21.065s
sys 0m3.640s

This:
real 19m41.912s
user 19m38.092s
sys 0m3.760s

PS: You don't need the extra || check, whether the array exists or not. If it doesn't, then the first number will always be larger than a non-existent field.

6 replies

stig Jun 7, 2024
Author

PS: You don't need the extra || check, whether the array exists or not. If it doesn't, then the first number will always be larger than a non-existent field.

How about for the "min" field?

shadowrun96 Jun 7, 2024

You're right!
I was going to run some more tests, but I'm afraid of burning out my SSD. I already burnt my brain since the afternoon (and it's 2:40 here =).

shadowrun96 Jun 8, 2024

I do get exactly the same results as the script with the existence checks (at least for the first 1 million records) though:

george@darkstar:~/projects/2024/awk$ time head -1000000 /opt/usr/measurements.txt | awk -f frapa.awk > frapa-1mil-out.txt

real	0m1.476s
user	0m1.534s
sys	0m0.097s
george@darkstar:~/projects/2024/awk$ time head -1000000 /opt/usr/measurements.txt | awk -f d.awk > d-1mil-out.txt

real	0m1.277s
user	0m1.311s
sys	0m0.097s
george@darkstar:~/projects/2024/awk$ sort frapa-1mil-out.txt > frapa-1mil-sort.txt
george@darkstar:~/projects/2024/awk$ sort d-1mil-out.txt > d-1mil-sort.txt
george@darkstar:~/projects/2024/awk$ diff frapa-1mil-sort.txt d-1mil-sort.txt 
george@darkstar:~/projects/2024/awk$

shadowrun96 Jun 8, 2024

Ah, mystery solved.

min 1 Ouaké -24.6
max 2 Uruapan 23.2
min 3 Kamalāpuram -73.6
min 4 Tirutturaippūndi -74.7
max 5 Holbæk 95.9
min 6 Sahtāh -60.9
min 7 Nanping -30.4
min 8 Herzberg am Harz -79.1
max 9 Dorogobuzh 7.0
min 10 Chantal -82.5
max 11 Innsbruck 96.8

If the number is less than 0 and it doesn't exist, the min block gets assigned. If it's larger than 0 and it doesn't exist, then the assignment goes to the max block.

Apparently when you are inside a numerical context, "null" values get interpolated as 0. I remember using a similar trick to coerce a number in a string, to be treated as an integer, by adding 0 to it.

setop Jun 12, 2024

You are not the only one to make this mistake benhoyt/go-1brc#1 (comment) :)

1brc in AWK (< 6 minutes 😁 ) #171

Replies: 11 comments · 20 replies

stig Jan 6, 2024 Author

stig Jan 6, 2024 Author

stig Jan 6, 2024 Author

stig Jan 6, 2024 Author

stig Jan 6, 2024 Author

stig Jan 6, 2024 Author

stig Jan 6, 2024 Author

stig Jan 6, 2024 Author

stig Jan 6, 2024 Author

stig Feb 26, 2024 Author

stig Jun 7, 2024 Author

Replies: 11 comments 20 replies

stig
Jan 6, 2024
Author

stig Jan 6, 2024
Author

stig Jan 6, 2024
Author

stig Jan 6, 2024
Author

stig Jan 6, 2024
Author

stig Jan 6, 2024
Author

stig Jan 6, 2024
Author

stig
Jan 6, 2024
Author

stig Jan 6, 2024
Author

stig Feb 26, 2024
Author

stig Jun 7, 2024
Author