Topic: Linux bash help.

Posted under Off Topic

Please help. I'm trying to replace absolute ftp links in 13 thousand HTML files to relative links. I have found this script but unfortunately it is not created with branched directory trees in mind :/
This fails with cmp complaining that some files are directories. After removing the if-statement, it fails the same, but silently:

cd Kondor_html
for y in *
do
sed 's_ftp:////unicorn.wereanimal.net_..//..//.._' "$y" >temp
if cmp temp "$y" >/dev/null
then
rm temp
else
mv temp "$y"
fi
done

Updated

After further googling, I found this:

grep -ilr ‘old-word’ * | xargs -i@ sed -i ‘s/old-word/new-word/g’ @

Unfortuanely, on DSL the xargs does not recognise neither the -i nor -I option :(

Updated by anonymous

banana_milkshake said:
ask blaziken

I don't actually know much about the BASH.

Updated by anonymous

If you're willing to hard-code the address of the script, you could call it recursively within each subdirectory it comes across. Essentially something like:

#!/bin/bash
for file in *
do
if [ -d "$file" ]
then
cd file
/home/username/path/to/script.sh
cd ..
else
sed 's_ftp:////unicorn.wereanimal.net_..//..//.._' "$file" >temp
if cmp temp "$file" >/dev/null
then
rm temp
else
mv temp "$file"
fi
fi
done

Best Bash scripting reference on the internet: http://tldp.org/LDP/abs/html/

And DText seriously needs more formatting options. At the very least, stop sanitizing  . We needs that!

Edit: Spotted an error in the script I wrote. Don't get me wrong, I never bothered to test the new script, either, and make no guarantee that it works as given. (See also: http://tldp.org/LDP/abs/html/)

Edit edit: One-letter variable names generally suck. Generally don't do it.

Updated by anonymous

Maybe using find is a bit less complex thing than using a recursion?

for file in "find $* -type f"

Updated by anonymous

Good to see someone else has a full mirror of this site.

What was the final filesize it came to? Mine came to about 6GB.

Try running WinHTTrack / HTTrack over the directory; it supports file:// addresses (and will rewrite all the HTML for you as well as handle rules and so on). Really handy feature.

Updated by anonymous

Jazz said:
for file in "find $* -type f"

I have very little experience with bash scripts and was unsure if it would work. I asked around on the first "linux help" I found on the web, but it seems the site is dead :P
To avoid a wall of text, here is my question asked there:
http://www.linuxhelp.net/forums/index.php?showtopic=9446

In short, will this work?

for penis in grep -ilr 'ftp://server.com'
do
sed 's_ftp:////server.com_..//..//.._' "$penis" >temp
if cmp temp "$penis" >/dev/null
then
rm temp
else
mv temp "$penis"
fi
done

I am not sure if the sed statement is correct. I added a second slash to each because I recall that's how you escape them?

Varka said:
What was the final filesize it came to? Mine came to about 6GB.

I used WinHTTrack and that's where my problems came from.
The site is protected against mirroring and any files other than images are not downloaded. Basically, the whole /unicorn directory is omitted.

Can you explain to me how to force WinHTTrack to rebuild the files? I asked on their forum but no luck. It seems for me that there is no possibility for the program to check if files already exist and use those instead of downloading them again and again.

My mirror sizes are as follows:
Kandor - 556,4 MB
Ellgar - 5,3 GB
Kondor - 4,1 GB

Updated by anonymous

for penis in grep -ilr 'ftp://server.com'

Probably this

for file in "grep -ilr 'ftp://server.com' $*"

Can't tell anything about sed though, not good with it.

Updated by anonymous

Found the solution.

#! /bin/bash
cd directory
for file in $(grep -ilr 'ftp://foo.net' *)
do
sed 's_ftp://foo.net_../../.._g' "$file" >temp
if cmp temp "$file" >/dev/null
then
rm temp
else
mv temp "$file"
fi
done
exit 0

Updated by anonymous

  • 1