Every now and then, Stallman /www/ partition is full.
<RANT author="jb">
We've been running out of space on this box for years, people have been
spending countless hours doing sysad stuffs, coding things and generally
loosing time cuz that box is important and serves an amazing number of
pages to an amazing number of people.
So, one can get very PISSED when other people just don't care about this
and use stallman to put their useless files. These days, a bunch of IMC
have moved to others servers and have left their rotten files here, they
know there are disk space problems (they migrated because of that), but
they won't do anything about that, they are loosing people time, they are
loosing my time, so i hope they'll have their 4sses kicked hard.
Thank you for your attention :]
</RANT>
For what follows, you need a sudo account on the box.
WARNING WARNING WARNING
It's very important not to remove filenames on Stallman and
other servers running Active.
What's wrong: Active uses the filename provided by the
client browser that uploads a file by default, and add randoms
bytes to this filename in case a there is already a file with this
name. If the a
foobar.jpg file is mirrored and you
remove it from stallman, this filename will be free for reuse and
good chances are that another upload takes it. Next time
pushtoeye.pl is run, the first
foobar.jpg
will be killed.
How to cope with this sh1t : just don't delete filenames,
instead, do a:
cat "i'm a placeholder, don't remove me" > foobar.jpg
pushtoeye.pl now ignores files smaller than 100
bytes, so this
foobar.jpg won't be mirrored, won't
cause any harm and eventually won't space disk space.
Future
Next time an Active release is out, what will be used is a MD5
hash based filename scheme (like what is used now in space saver
scripts) ressembling this:
mv foobar.jpg md5-$(openssl md5 < foobar.jpg).$(mime-ext foobar.jpg)
which is probably an horror like
md5-30882989c58aab4531b8ff20609074d3.jpe ...
- openssl does the md5 hashing;
- mime-ext guesses the mime type from the "street" extension of the filename, and convert it back to an official extension. Here, jpg is an extension people often use, and jpe is the official extension associated with the image/jpeg mime type.
Removing weird thumbnails
Active produces bizarre thumbnails (of mp3 and realmedia files, etc), this will probably be patched in next release, but for now you have to
remove such things at hand:
sudo sh /usr/local/sbin/remove-badthumbs.sh
This script gets rid of files that have names suchs as *-thumb.$ext, where $ext is something like mp3 or ram .
If you can think of a new extension to add, edit the file and
put it in. Of course, it sucks if someone uploaded a file named that way.
Duplicate files
Removing duplicates
Some files are posted many times on several IMCs, there's a way to hard links them to a file repository that's in /www/Uploads/, so that they
the bunch of duplicates files just need the disk space of a single one. Run this script:
sudo sh /usr/local/sbin/remove-dups.sh
What's used here is a script that's in /usr/local/sbin/nodup.py.
Removing non duplicates
After you've collected duplicates, problem is that you have no way to clean /www/uploads to have some space back simply, because plenty of files
you're removing are still in the duplicates files repository, here's how you commit your cleanup:
sudo sh /usr/local/sbin/remove-single.sh
Mirroring
Many multimedia files are mirrored on images.indymedia.org and
are supposed to be served form there. Therefore, they can be removed from the disks on stallman.
rm-if-mirrored
There is a script that removes files if they are mirrored. To
remove mirrored files in /www/uploads that are more than 15 days
old, you can do a:
sudo rm-if-mirrored -v -f /www/uploads/ -mtime +15 -type f
everything that's after the -f option is passed to a find(1).
jb@stallman:~$ rm-if-mirrored -h
Options:
-h print this help message
-f <find args> use find to list files and pass it arguments
-v increment verbosity level
-p print files that should be removed, don't remove
-B not really remove files
So far,
gif, jpeg, mpeg, mpeg, *.ra and *.rm (realmedia files) are removed.
I added *.wav to the list of removed files today Feb/05/2003 (pietro).
Don't forget to run the remove-single.sh script when you're done.
Where the files are
So far, most of the files are mirrored on images.indymedia.org,
Active is such a pain to configure that Apache mod_rewrite is used
to help queries to find their way to mirrored documents. Have a look here (
http://httpd.apache.org/docs/misc/rewriteguide.html) for more information about how all this works. Here is a part of our httpd.conf file (more or less).
<Directory /www/vhosts/*/local/webcast/uploads/ >
RewriteEngine On
... more config here ...
\# sweden is also se
RewriteCond %{HTTP_HOST} se\.indymedia\.org [OR]
RewriteCond %{HTTP_HOST} www.\se\.indymedia\.org
RewriteRule ([^/]*)\.(gif|jpe?g?|png|mpe?g|mp3|mov)$ \
http://images.indymedia.org/imc/sweden/$1.$2 [L]
\# www.something.indymedia.org
\#RewriteCond %{REQUEST_FILENAME} !vhosts/www\.italy
RewriteRule vhosts/www\.([^/]*)\.indymedia.org/.*/([^/]*)\.(gif|jpe?g?|png|mpe?g|mp3|mov)$ \
http://images.indymedia.org/imc/$1/$2\.$3 [L]
\# something.indymedia.org
\#RewriteCond %{REQUEST_FILENAME} !vhosts/italy
RewriteRule vhosts/([^/]*)\.indymedia.org/.*/([^/]*)\.(gif|jpe?g?|png|mpe?g|mp3|mov)$ \
http://images.indymedia.org/imc/$1/$2\.$3 [L]
... more config here ...
</Directory>
¿ What's next ?
Some file formats are a real pain, like BMP, TIF, WAV, etc. when it comes to disk space. I am currently in the process of converting all problematic file formats to something more useable. Currently, the transformation map is:
- BMP files are converted to PNG;
- TIF files are converted to JPG;
- WAV files are to be converted to MP3 (or OGG, let's see), so far the problem is that WAV to MP3 convertion (by Lame) of an average file is a 2 minutes CPU killer.
- I have no clue about how to transform bad video formats to something decent, and if decent video formats exits to start with.
The script that does this is currently /home/jb/Convert.sh, it may move to /usr/local/sbin/Convert.sh when it stabilizes.
Due to filename collisions (like: foobar.bmp is to be converted to foobar.png, but someone already uploaded a foobar.png, oops), Apache directives à la
RewriteMap? have to be used.
--
JiBe - 13 May 2002
Added exts to be removed: doc avi mpg mpeg mov pdf mpa txt zip wmv wma asf mswmm htm html swf
StallmanDiskSpaceBigfilelist is a list of files above 1MB found in subdirectories of /www/upload/
--
AndiE - 05 Oct 2002
to top