[llvm] r329648 - [CachePruning] Fix comment about ext4 per-directory file limit. NFC

Fāng-ruì Sòng via llvm-commits llvm-commits at lists.llvm.org
Wed Apr 11 10:36:31 PDT 2018


truncate -s 100G 100G; mkfs.ext4 -N 30000000 100G
mkdir ext4; mount 100G ext4

For a directory using htree without turning on large_dir:

508   root node entries (root_limit)
510   internal node entries (node_limit)

For a filename with 40 bytes, its sizeof(ext4_dir_entry_2) = 48, a linear directory (a leaf node of the htree) can contain at most floor(4096/48)=85 of them.
The real per-directory entry limit should be 508*510*85 = 22021800
The limit varies with the average length of filenames.

However, the code does not try rebalancing the htree, so we will not be able to create filenames in a full leaf node. This is demonstrated with the following example, certain filenames cannot be used while others can:

% touch d/0000000000000000000000000000000000816a6f
touch: cannot touch 'd/0000000000000000000000000000000000816a6f': No space left on device
% touch d/0000000000000000000000000000000000816a70
# succeeded

Another issue is that the size used by htree cannot shrink. I believe we can exceed that limit if we use latest mkfs.ext4 with -O large_dir, see http://e2fsprogs.sourceforge.net/e2fsprogs-release.html#1.44.0

On 2018-04-10, Peter Collingbourne wrote:
>Here is what I did.
>
>$ mkfs.ext4 -N 200000000 100G
>$ sudo mount /path/to/100G d3
>$ ./make
>[...]
>d3/d/000000000000000000000000900000
>d3/d/000000000000000000000000910000
>d3/d/000000000000000000000000920000
>d3/d/000000000000000000000000930000
>d3/d/000000000000000000000000940000
>d3/d/000000000000000000000000950000
>d3/d/000000000000000000000000960000
>d3/d/000000000000000000000000970000
>d3/d/000000000000000000000000980000
>d3/d/000000000000000000000000990000
>d3/d/0000000000000000000000009a0000
>d3/d/0000000000000000000000009b0000
>open: No space left on device
>$ vi make.c
>[ changed %030lx to %029lx ]
>$ make make
>$ ./make
>d3/d/00000000000000000000000000000
>open: No space left on device
>$ vi make.c
>[ changed d3/d to d3/d4 ]
>$ make make
>$ ./make 
>d3/d4/00000000000000000000000000000
>d3/d4/00000000000000000000000010000
>d3/d4/00000000000000000000000020000
>d3/d4/00000000000000000000000030000
>d3/d4/00000000000000000000000040000
>d3/d4/00000000000000000000000050000
>d3/d4/00000000000000000000000060000
>d3/d4/00000000000000000000000070000
>d3/d4/00000000000000000000000080000
>^C
>
>I found this in dmesg which seemed relevant:
>
>[3552607.241532] EXT4-fs warning (device loop0): ext4_dx_add_entry:2228: inode
>#79051521: comm make: Directory index full!
>
>Searching the kernel source code for that error message led me to this commit:
>https://github.com/torvalds/linux/commit/
>e08ac99fa2a25626f573cfa377ef3ddedf2cfe8f
>So there does indeed appear to be a default limit of 10M entries per directory.
>
>Peter
>
>On Tue, Apr 10, 2018 at 4:14 PM, Fāng-ruì Sòng <maskray at google.com> wrote:
>
>    You may create a sparse filesystem in tmpfs as I did in a previous reply.
>
>    truncate -s 100G 100G
>    mkfs.ext4 100G
>
>    I am pretty sure you have hit some other limit (and in your case likely the
>    disk space as you only got 4% left), rather than the hypothetical
>    "per-directory file limit". Does `dmesg` report anything suspicious?
>
>    On Tue, Apr 10, 2018 at 3:55 PM Peter Collingbourne <peter at pcc.me.uk>
>    wrote:
>
>        On Tue, Apr 10, 2018 at 3:23 PM, Fangrui Song <maskray at google.com>
>        wrote:
>
>            On 2018-04-10, Peter Collingbourne wrote:
>           
>                As I said that wasn't the limit I was hitting. Here is the
>                program that I
>                wrote:
>
>                #include <stdint.h>
>                #include <errno.h>
>                #include <stdio.h>
>                #include <sys/types.h>
>                #include <sys/stat.h>
>                #include <fcntl.h>
>                #include <unistd.h>
>                #include <string.h>
>
>                int main() {
>                  for (uint64_t i = 0;; ++i) {
>                    char buf[256];
>                    snprintf(buf, 256, "d/%032lx", i);
>                    if (i % 65536 == 0)
>                      puts(buf);
>                    int fd = open(buf, O_CREAT);
>                    if (fd == -1) {
>                      printf("open: %s\n", strerror(errno));
>                      return 1;
>                    }
>                    close(fd);
>                  }
>                }
>
>                The output ends in:
>
>                d/00000000000000000000000000980000
>                d/00000000000000000000000000990000
>                d/000000000000000000000000009a0000
>                d/000000000000000000000000009b0000
>                d/000000000000000000000000009c0000
>                d/000000000000000000000000009d0000
>                d/000000000000000000000000009e0000
>                d/000000000000000000000000009f0000
>                open: No space left on device
>
>                df:
>                Filesystem                                             
>                1K-blocks      Used
>                Available Use% Mounted on
>                [redacted] 856678580 780069796  33068920  96% [redacted]
>
>                df -i:
>                Filesystem                                               
>                Inodes    IUsed   
>                IFree IUse% Mounted on
>                [redacted] 54411264 18824333 35586931   35% [redacted]
>
>                Peter
>
>            I suspect your case triggered a hashed btree split that would
>            consume
>            more disk space. Can you try again on a newly created ext4
>            filesystem
>            (ensuring it has sufficient space left).
>
>            Your example works well on my machine: I cannot created files in
>            other directories as well. dumpe2fs tells me the inodes are used
>            up.
>
>        I don't have enough disk space on my machine right now. Maybe you can
>        try creating the file system with -N (some large number)?
>
>        Peter
>
>                On Tue, Apr 10, 2018 at 1:40 PM, Fangrui Song <
>                maskray at google.com> wrote:
>
>                   su
>                   truncate -s 100G 100G
>                   mkfs.ext4 100G
>                   mkdir ext4
>                   mount 100G ext4
>                   cd ext4
>
>                   mkdir p
>                   cd p
>                   python3 -c 'for i in range(6600000):\n with open(str
>                (i),"w"): pass'
>
>                   It runs out of inodes with some message like:
>
>                   OSError: [Errno 28] No space left on device: '6553587'
>
>                   umount ext4; dumpe2fs 100G # says the inodes are used up
>                   ...
>                   Free inodes:              0
>                   ...
>
>                   On 2018-04-10, Peter Collingbourne wrote:
>
>                       No, these were empty files. It wasn't an inode limit
>                because I could
>                       still
>                       create files in other directories.
>
>                       Peter
>
>                       On Tue, Apr 10, 2018 at 10:35 AM, Fangrui Song <
>                maskray at google.com>
>                       wrote:
>
>                          On 2018-04-09, Peter Collingbourne wrote:
>
>                              Are you sure about that? I'm pretty sure that
>                before writing
>                       that
>                              comment I
>                              wrote a small program that created lots of files
>                (not
>                       subdirectories)
>                              in a
>                              directory until it started getting error
>                messages, which started
>                              happening at
>                              around 6000000 files.
>
>                              Peter
>
>                          I guess you created a file of 100GiB. The number of
>                inodes is
>                       roughly
>                          6553600.
>
>                          100*1024*1024*1024 / 16384 = 6553600.0 where 16384 is
>                the default
>                          bytes-per-inode (man mke2fs).
>
>                          % truncate -s 100G 100G
>                          % mkfs.ext4 100G
>                          % dumpe2fs 100G
>                          .....
>                          Inode count:              6553600
>                          .....
>
>                          Each file consumes one inode and the number of files
>                in that
>                       directory
>                          is limited by this factor.
>
>                              On Mon, Apr 9, 2018 at 5:12 PM, Fangrui Song via
>                llvm-commits <
>                              llvm-commits at lists.llvm.org> wrote:
>
>                                 Author: maskray
>                                 Date: Mon Apr  9 17:12:28 2018
>                                 New Revision: 329648
>
>                                 URL: http://llvm.org/viewvc/llvm-project?rev=
>                329648&view=rev
>                                 Log:
>                                 [CachePruning] Fix comment about ext4
>                per-directory file
>                       limit. NFC
>
>                                 There is a limit on number of subdirectories
>                if dir_nlinks is
>                       not
>                                 enabled (31998), but per-directory number of
>                files is not
>                       limited.
>
>                                 Modified:
>                                     llvm/trunk/include/llvm/Support/
>                CachePruning.h
>
>                                 Modified: llvm/trunk/include/llvm/Support/
>                CachePruning.h
>                                 URL: http://llvm.org/viewvc/llvm-project/llvm/
>                trunk/include/
>                       llvm/
>                              Support/
>                                 CachePruning.h?rev=329648&r1=329647&r2=329648&
>                view=diff
>                                 ==============================================
>                ===============
>                       =======
>                              =======
>                                 ===
>                                 --- llvm/trunk/include/llvm/Support/
>                CachePruning.h (original)
>                                 +++ llvm/trunk/include/llvm/Support/
>                CachePruning.h Mon Apr  9
>                              17:12:28 2018
>                                 @@ -52,9 +52,8 @@ struct CachePruningPolicy {
>                                    /// the number of files based pruning.
>                                    ///
>                                    /// This defaults to 1000000 because with
>                that many files
>                       there
>                              are
>                                 -  /// diminishing returns on the
>                effectiveness of the cache,
>                       and
>                              some file
>                                 -  /// systems have a limit on how many files
>                can be
>                       contained in a
>                                 directory
>                                 -  /// (notably ext4, which is limited to
>                around 6000000
>                       files).
>                                 +  /// diminishing returns on the
>                effectiveness of the cache,
>                       and
>                              file
>                                 +  /// systems have a limit on total number of
>                files.
>                                    uint64_t MaxSizeFiles = 1000000;
>                                  };
>
>                                 _____________________________
>                __________________
>                                 llvm-commits mailing list
>                                 llvm-commits at lists.llvm.org
>                                 http://lists.llvm.org/cgi-bin/mailman/listinfo
>                /llvm-commits
>
>                              --
>                              -- 
>                              Peter
>
>                          --
>                          宋方睿
>
>                       --
>                       -- 
>                       Peter
>
>                   --
>                   宋方睿
>
>                --
>                -- 
>                Peter
>
>            --
>            宋方睿
>       
>
>        --
>        -- 
>        Peter
>
>    --
>    宋方睿
>
>--
>-- 
>Peter

-- 
宋方睿


More information about the llvm-commits mailing list