WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

Replace b64url with Erlang/OTP stdlib base64 #5801

@Benjamin-Philip

Description

@Benjamin-Philip

Presently, url-safe base64 encoding is handled by b64url NIF at src/b64url. However, support for RFC 4648 compliant url-safe encoding was added to Erlang stdlib's base64 in Erlang/OTP 26.0. Additionally, encoding was made upto 4 times faster thanks to the JIT compiler that was merged in the same release.

Benchmarking base64 and b64url with benchee1, with the following benchmark, we find that the built-in base64 is faster:

Mix.install([:benchee, {:b64url, github: "apache/couchdb", sparse: "src/b64url/"}])

defmodule B64Bench do
  def main do
    [workers, min_size, max_size, duration, entries] =
      Enum.map(System.argv(), &String.to_integer/1)

    bytes =
      1..entries
      |> Enum.to_list()
      |> Enum.map(fn _ ->
        :crypto.strong_rand_bytes(min_size + :rand.uniform(max_size - min_size))
      end)

    Benchee.run(
      %{
        "b64url" => fn input -> process(input, &:b64url.encode/1, &:b64url.decode/1) end,
        "base64 (standard) + re" => fn input ->
          process(
            input,
            fn url ->
              url = :erlang.iolist_to_binary(:re.replace(:base64.encode(url), "=+$", ""))
              url = :erlang.iolist_to_binary(:re.replace(url, "/", "_", [:global]))
              :erlang.iolist_to_binary(:re.replace(url, "\\+", "-", [:global]))
            end,
            fn url64 ->
              url64 = :erlang.iolist_to_binary(url64)
              url64 = :erlang.iolist_to_binary(:re.replace(url64, "-", "+", [:global]))
              url64 = :erlang.iolist_to_binary(:re.replace(url64, "_", "/", [:global]))

              padding =
                :erlang.list_to_binary(
                  :lists.duplicate(rem(4 - rem(:erlang.size(url64), 4), 4), 61)
                )

              :base64.decode(<<url64::binary, padding::binary>>)
            end
          )
        end,
        "base64 (urlsafe)" => fn input ->
          process(
            input,
            &:base64.encode(&1, %{mode: :urlsafe}),
            &:base64.decode(&1, %{mode: :urlsafe})
          )
        end
      },
      parallel: workers,
      time: duration,
      inputs: %{"generated" => bytes}
    )

    IO.inspect(:erlang.byte_size(Enum.join(bytes)), label: "Total size (B)")
  end

  def process(bytes, encode, decode) do
    Enum.each(bytes, fn bin -> decode.(encode.(bin)) end)
  end
end

B64Bench.main()
$ elixir b64_bench.exs 4 10 100 60 100
Operating System: Linux
CPU Information: 12th Gen Intel(R) Core(TM) i7-1255U
Number of Available Cores: 12
Available memory: 15.31 GB
Elixir 1.18.4
Erlang 28.0.2
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 1 min
memory time: 0 ns
reduction time: 0 ns
parallel: 4
inputs: generated
Estimated total run time: 3 min 6 s
Excluding outliers: false

Benchmarking b64url with input generated ...
Benchmarking base64 (standard) + re with input generated ...
Benchmarking base64 (urlsafe) with input generated ...
Calculating statistics...
Formatting results...

##### With input generated #####
Name                             ips        average  deviation         median         99th %
base64 (urlsafe)              8.18 K      122.31 μs    ±44.53%      110.85 μs      296.40 μs
b64url                        6.18 K      161.88 μs    ±56.17%      139.65 μs      609.83 μs
base64 (standard) + re        0.74 K     1345.47 μs    ±32.12%     1076.60 μs     2221.58 μs

Comparison: 
base64 (urlsafe)              8.18 K
b64url                        6.18 K - 1.32x slower +39.57 μs
base64 (standard) + re        0.74 K - 11.00x slower +1223.16 μs
Total size (B): 5491

Therefore I propose we drop b64url in favour of the stdlib functions. This has the following benefits:

  • Less code to maintain
  • (marginally) better peformance
  • Enhanced safety (by way of eliminating an NIF)

Footnotes

  1. I found updating the existing benchmarks to compare 3 or more implementations tedious. The new benchmark's arguments are similar to the previous benchmark, with the exception of an extra parameter entries, which is the number of random binaries to encode. The ips results are directly proportional to the previous bps and can be converted to bps by multiplying by total size.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions