Extracting multipart tgz archive from Google Takeout

DezeStijn4 mins read

Google Takeout

Google allows you to make an export of all data it stores about your account via Google Takeout. This includes Google Drive documents, pictures Google Photos, emails from Gmail, but also (meta)date from products like Google Maps, Fit and Fitbit, Purchases, Google Home, and many more.

Getting an export is useful for making back-ups or if you plan on cleaning-up (or fully getting rid of) your Google account.

When you request such an export, you can choose to receive the data in multiple zip files of maximum 4GB each or tgz archives of maximum 50GB in size. Once the export is ready, you’ll receive a link to a download page where you can download the archive parts.

Since the archive is split up in chunks, you need to download all of them and combine them together before you can sucessfully extract the contents.

Extracting the archive

In the example below I took an export via Google Takeout and selected the option to download the data in 50GB chunks of tgz archives.

$ ls -lh

total 155G
-rw-rw-r-- 1 sequr sequr  50G Mar  4 22:09 takeout-20251113T192843Z-001.tgz
-rw-rw-r-- 1 sequr sequr  50G Mar  5 00:04 takeout-20251113T192843Z-002.tgz
-rw-rw-r-- 1 sequr sequr  50G Mar  5 23:21 takeout-20251113T192843Z-003.tgz
-rw-rw-r-- 1 sequr sequr 5.9G Mar  5 21:57 takeout-20251113T192843Z-004.tgz

Extracting these archives is as simple as concatening the files and piping the result into tar.

$ cat takeout-20251113T192843Z-*.tgz | tar xzivf - -C targetdir/
# Starts printing a line for every file extracted

Below is the description of the parameters passed to tar. Note that f needs to be the last in the list, since that indicates the archive file follows. In this case it’s - or standard input (stdin) since we’re piping the output of cat from stdout to stdin.

$ man tar

       -i, --ignore-zeros
              Ignore zeroed blocks in archive.  Normally two consecutive 512-blocks filled  with  zeroes  mean
              EOF and tar stops reading after encountering them.  This option instructs it to read further and
              is useful when reading archives created with the -A option.
       -v, --verbose
              Verbosely list files processed.  Each instance of this option on the command line increases  the
              verbosity  level  by  one.   The maximum verbosity level is 3.  For a detailed discussion of how
              various verbosity levels affect tar's output, please refer to GNU Tar Manual,  subsection  2.5.1
              "The --verbose Option".
       -x, --extract, --get
              Extract  files  from an archive.  Arguments are optional.  When given, they specify names of the
              archive members to be extracted.
       -z, --gzip, --gunzip, --ungzip
              Filter the archive through gzip(1).

       -f, --file=ARCHIVE
              Use archive file or device ARCHIVE.  If this option is not given, tar will first examine the en‐
              vironment  variable  `TAPE'.   If it is set, its value will be used as the archive name.  Other‐
              wise, tar will assume the compiled-in default.  The default value can be inspected either  using
              the --show-defaults option, or at the end of the tar --help output.

              An archive name that has a colon in it specifies a file or device on a remote machine.  The part
              before the colon is taken as the machine name or IP address, and the part after it as  the  file
              or device pathname, e.g.:

              --file=remotehost:/dev/sr0

              An optional username can be prefixed to the hostname, placing a @ sign between them.

              By  default,  the  remote host is accessed via the rsh(1) command.  Nowadays it is common to use
              ssh(1) instead.  You can do so by giving the following command line option:

              --rsh-command=/usr/bin/ssh

              The remote machine should have the rmt(8) command installed.  If its  pathname  does  not  match
              tar's default, you can inform tar about the correct pathname using the --rmt-command option.

During the extraction process, a line is printed to the terminal for every file that’s extracted.

Tracking progress

If you have Pipe Viewer (pv) installed, you can track the progress of the export as well. Replace cat with pv in the command above.

$ pv takeout-20251113T192843Z-*.tgz | tar xzivf - -C targetdir/
# Track progress of extraction

Archive Browser

Once the wall of text stops moving, the last files you’ll see getting extracted is Takeout/archive_browser.html. This is a sort of one-page app that contains information about the exported as well as a file browsing allowing you to check all files that were exported.

xdg-open targetdir/Takeout/archive_browser.html