Skip to topic | Skip to bottom
Home
Sysadmin
Sysadmin.StallmanDiskSpaceIssuesr1.9 - 22 Jun 2003 - 10:50 - JiBetopic end
You are here: Sysadmin > StallmanInfo > StallmanDiskSpaceIssues

Start of topic | Skip to actions
Every now and then, Stallman /www/ partition is full.

<RANT author="jb"> We've been running out of space on this box for years, people have been spending countless hours doing sysad stuffs, coding things and generally loosing time cuz that box is important and serves an amazing number of pages to an amazing number of people.

So, one can get very PISSED when other people just don't care about this and use stallman to put their useless files. These days, a bunch of IMC have moved to others servers and have left their rotten files here, they know there are disk space problems (they migrated because of that), but they won't do anything about that, they are loosing people time, they are loosing my time, so i hope they'll have their 4sses kicked hard. Thank you for your attention :] </RANT>

For what follows, you need a sudo account on the box.

WARNING WARNING WARNING

ALERT! It's very important not to remove filenames on Stallman and other servers running Active.

What's wrong: Active uses the filename provided by the client browser that uploads a file by default, and add randoms bytes to this filename in case a there is already a file with this name. If the a foobar.jpg file is mirrored and you remove it from stallman, this filename will be free for reuse and good chances are that another upload takes it. Next time pushtoeye.pl is run, the first foobar.jpg will be killed.

How to cope with this sh1t : just don't delete filenames, instead, do a:

   cat "i'm a placeholder, don't remove me" > foobar.jpg

pushtoeye.pl now ignores files smaller than 100 bytes, so this foobar.jpg won't be mirrored, won't cause any harm and eventually won't space disk space.

Future

Next time an Active release is out, what will be used is a MD5 hash based filename scheme (like what is used now in space saver scripts) ressembling this:

   mv foobar.jpg md5-$(openssl md5 < foobar.jpg).$(mime-ext foobar.jpg)

which is probably an horror like md5-30882989c58aab4531b8ff20609074d3.jpe ...

  • openssl does the md5 hashing;
  • mime-ext guesses the mime type from the "street" extension of the filename, and convert it back to an official extension. Here, jpg is an extension people often use, and jpe is the official extension associated with the image/jpeg mime type.

Removing weird thumbnails

Active produces bizarre thumbnails (of mp3 and realmedia files, etc), this will probably be patched in next release, but for now you have to remove such things at hand:

   sudo sh /usr/local/sbin/remove-badthumbs.sh

This script gets rid of files that have names suchs as *-thumb.$ext, where $ext is something like mp3 or ram . If you can think of a new extension to add, edit the file and put it in. Of course, it sucks if someone uploaded a file named that way.

Duplicate files

Removing duplicates

Some files are posted many times on several IMCs, there's a way to hard links them to a file repository that's in /www/Uploads/, so that they the bunch of duplicates files just need the disk space of a single one. Run this script:

   sudo sh /usr/local/sbin/remove-dups.sh

What's used here is a script that's in /usr/local/sbin/nodup.py.

Removing non duplicates

After you've collected duplicates, problem is that you have no way to clean /www/uploads to have some space back simply, because plenty of files you're removing are still in the duplicates files repository, here's how you commit your cleanup:

   sudo sh /usr/local/sbin/remove-single.sh

Mirroring

Many multimedia files are mirrored on images.indymedia.org and are supposed to be served form there. Therefore, they can be removed from the disks on stallman.

rm-if-mirrored

There is a script that removes files if they are mirrored. To remove mirrored files in /www/uploads that are more than 15 days old, you can do a:

   sudo rm-if-mirrored -v -f /www/uploads/ -mtime +15 -type f

everything that's after the -f option is passed to a find(1).

jb@stallman:~$ rm-if-mirrored -h
Options:
    -h              print this help message
    -f <find args>  use find to list files and pass it arguments
    -v              increment verbosity level
    -p              print files that should be removed, don't remove
    -B              not really remove files

So far, gif, jpeg, mpeg, mpeg, *.ra and *.rm (realmedia files) are removed.

I added *.wav to the list of removed files today Feb/05/2003 (pietro).

Don't forget to run the remove-single.sh script when you're done.

Where the files are

So far, most of the files are mirrored on images.indymedia.org, Active is such a pain to configure that Apache mod_rewrite is used to help queries to find their way to mirrored documents. Have a look here (http://httpd.apache.org/docs/misc/rewriteguide.html) for more information about how all this works. Here is a part of our httpd.conf file (more or less).


<Directory /www/vhosts/*/local/webcast/uploads/ >
   RewriteEngine On

   ... more config here ...

   \# sweden is also se
   RewriteCond %{HTTP_HOST} se\.indymedia\.org [OR]
   RewriteCond %{HTTP_HOST} www.\se\.indymedia\.org
   RewriteRule ([^/]*)\.(gif|jpe?g?|png|mpe?g|mp3|mov)$ \ 
       http://images.indymedia.org/imc/sweden/$1.$2 [L]


   \# www.something.indymedia.org
   \#RewriteCond %{REQUEST_FILENAME} !vhosts/www\.italy
   RewriteRule vhosts/www\.([^/]*)\.indymedia.org/.*/([^/]*)\.(gif|jpe?g?|png|mpe?g|mp3|mov)$ \ 
       http://images.indymedia.org/imc/$1/$2\.$3 [L]


   \# something.indymedia.org
   \#RewriteCond %{REQUEST_FILENAME} !vhosts/italy
   RewriteRule vhosts/([^/]*)\.indymedia.org/.*/([^/]*)\.(gif|jpe?g?|png|mpe?g|mp3|mov)$  \ 
       http://images.indymedia.org/imc/$1/$2\.$3 [L]

   ... more config here ...

</Directory>

¿ What's next ?

Some file formats are a real pain, like BMP, TIF, WAV, etc. when it comes to disk space. I am currently in the process of converting all problematic file formats to something more useable. Currently, the transformation map is:

  • BMP files are converted to PNG;
  • TIF files are converted to JPG;
  • WAV files are to be converted to MP3 (or OGG, let's see), so far the problem is that WAV to MP3 convertion (by Lame) of an average file is a 2 minutes CPU killer.
  • I have no clue about how to transform bad video formats to something decent, and if decent video formats exits to start with.

The script that does this is currently /home/jb/Convert.sh, it may move to /usr/local/sbin/Convert.sh when it stabilizes.

Due to filename collisions (like: foobar.bmp is to be converted to foobar.png, but someone already uploaded a foobar.png, oops), Apache directives à la RewriteMap? have to be used.

-- JiBe - 13 May 2002

Added exts to be removed: doc avi mpg mpeg mov pdf mpa txt zip wmv wma asf mswmm htm html swf

StallmanDiskSpaceBigfilelist is a list of files above 1MB found in subdirectories of /www/upload/

-- AndiE - 05 Oct 2002
to top


Sysadmin.StallmanDiskSpaceIssues moved from Indydocs.StallmanDiskSpaceIssues on 02 Jun 2002 - 15:28 by ChristopherMitchell - put it back
You are here: Sysadmin > StallmanInfo > StallmanDiskSpaceIssues

to top

Copyright © 1999-2008 by the contributing authors.
All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding this tool? Send feedback (in English, Francais, Deutsch or Dutch).