If you’ve not been following, it’s a great time to catch up on recent developments. The plot remains the same. More Levietra spam, site still not completely clean. To summarize the previous methods of removal:
1. validate changes to files by date. Check out changed files versus known good – Again, the development GitHub proves invaluable as a validation resource.
2. Using compared code, make sense out of the attack, and search entire server for instances of similar malicious code. Always a great time to add a few ip deny statements based upon your findings. like other countries’ IP space.
3. Armed with dates, dump php error logs, IIS logs, check application and other system managed logs using automation such as grep in Linux, and findstr on Windows *(unless you are of the cygwin persuasion, or use the Linux subsystem for Windows). This step alone leads to any lazy/hacked/bad code errors, often to the right file.
4. Throw whole codebase at site. Run a differential tool, they are quick and effective. Read logs. Read more logs, ad infinitum.
At this point, no matter the search, I wasn’t finding the base64_decode I knew was in there somewhere. It simply had to be encoded, as I rolled a test db over the environment for a few. Still there. I searched the entire code base for the raw text contained in the site as Google saw it. I didn’t get the same text in my source, so I knew more of the same was hiding. All the remaining file dates matched the day of distribution.
Re-examining how base 64 works, I soon realized the very text the spammer was using could be used against him. Rather than search the site in increasingly complex ways, I decided to get stupid simple about it.
The encoding method has a feature of it that makes the end value of a character dependent upon it’s neighbor. This is what makes it so you can’t simply base64_encode your text and get the same value as the malicious code has stored in the files. Or can you?
In this instance I could validate that in my web client, I got spam-free bits. If only I could view their site as Google views it. I can, but that requires effort and spoofing IP addresses – which we WILL go over, but not today, please. The easy way is to log in to the Google webmaster tools, and ask it to show you the site.

After you login and select your site, click on site health at the left, and then Fetch as Google. This is where I could still see there were large enough portions of text that were probably encoded contiguously. Using that assumption, I went back to the drawing board. My first draft was horribly complex. I figured I could encode a portion of the code with every possible character on either side. Then I realized it would take an impractical amount of time to calculate all the permutations for (Vardenafil) – the longest contiguous chunk of text you could reasonably expect to be encoded as one string.
If you break it down, the fewest number of characters you can expect this to work with is 7, and the likelyhood of false positives is very high. If you increase you odds by searching for a longer string, you’ll likely only have positive results. I chose 12 characters, allowing 4 characters of precision. You’ll see why:
(Vardenafil) KFZhcmRlbmFmaWwp denafil) ZGVuYWZpbCk= rdenafil cmRlbmFmaWw= ardenafi YXJkZW5hZmk= Vardenaf dmFyZGVuYWY= (Vardena KFZhcmRlbmE= denafil) ZGVuYWZpbCkg rdenafil) cmRlbmFmaWwp ardenafil YXJkZW5hZmls Vardenafi VmFyZGVuYWZp (Vardenaf KFZhcmRlbmFm (Vardena IChWYXJkZW5h
Encoded and shifted, you get the Idea in my head as I initially saw it, but to put in a more succinct order:
(Vardenafil) KFZhcmRlbmFmaWwp (Vardenaf KFZhcmRlbmFm (Vardena KFZhcmRlbmE= rdenafil) cmRlbmFmaWwp rdenafil cmRlbmFmaWw= (Vardena IChWYXJkZW5h ardenafil YXJkZW5hZmls ardenafi YXJkZW5hZmk= Vardenaf dmFyZGVuYWY= Vardenafi VmFyZGVuYWZp denafil) ZGVuYWZpbCk= denafil) ZGVuYWZpbCkg
If you look very carefully, depending upon what’s next to dena, the 4 characters of precision I speak of, you see the begenning characters and ending characters change, but of the characters that stay put, there are only 3 possible output combinations for dena, based on what characters are on either side, or if it’s the very beginning. Note certain input was padded with a space, as I guess I could be reasonably certain that would be on either side of the characters. This is to give you some visual, still – the theory works with fewer pieces. Here’s the actual script I decided to run, and the results:
findstr /S VuYWZpbCk *.* findstr /S RlbmFmaWw *.* findstr /S JkZW5hZmk *.* findstr /S FyZGVuYWY *.* findstr /S ZhcmRlbmE *.* findstr /S hcmRlbmFm *.* findstr /S ZGVuYWZpbCkg *.* findstr /S cmRlbmFmaWwp *.* findstr /S YXJkZW5hZmls *.* findstr /S VmFyZGVuYWZp *.* findstr /S KFZhcmRlbmFm *.* findstr /S IChWYXJkZW5h *.*
Run this, and start looking hard at any results. I just compared the found file to the reference version again, and away the issue went.
findstr /S Vardenafil *.* findstr /S VuYWZpbCk *.* findstr /S RlbmFmaWw *.* libraries\joomla\utilities\utility.php:bnRlbnQgTWFuYWdlbWVudCIgLz4KICA8dGl0bGU+Q nV5IExldml0cmEgKFZhcmRlbmFmaWwpIE9u libraries\joomla\utilities\utility.php:IFF1b3RlczwvYT4gZm9yIHNpbGRlbmFmaWwgY2l0c mF0ZSwgd2hpY2ggRWxlY3Ryb25pYyBjaWdh findstr /S JkZW5hZmk *.* findstr /S FyZGVuYWY *.* findstr /S ZhcmRlbmE *.* findstr /S hcmRlbmFm *.* libraries\joomla\utilities\utility.php:bnRlbnQgTWFuYWdlbWVudCIgLz4KICA8dGl0bGU+Q nV5IExldml0cmEgKFZhcmRlbmFmaWwpIE9u libraries\joomla\utilities\utility.php:ZXZpdHJhfHZhcmRlbmFmaWx+aScsICRrZXl3b3JkK SkgKSB7CgkJCQkJaGVhZGVyKCdMb2NhdGlv findstr /S ZGVuYWZpbCkg *.* findstr /S cmRlbmFmaWwp *.* libraries\joomla\utilities\utility.php:bnRlbnQgTWFuYWdlbWVudCIgLz4KICA8dGl0bGU+Q nV5IExldml0cmEgKFZhcmRlbmFmaWwpIE9u findstr /S YXJkZW5hZmls *.* findstr /S VmFyZGVuYWZp *.* findstr /S KFZhcmRlbmFm *.* libraries\joomla\utilities\utility.php:bnRlbnQgTWFuYWdlbWVudCIgLz4KICA8dGl0bGU+Q nV5IExldml0cmEgKFZhcmRlbmFmaWwpIE9u findstr /S IChWYXJkZW5h *.*
You’ll notice it hit on every 3rd one, all three subtle variants on a similar theme. If you line them up as I did in the middle, you can pick out the three subsets that all look similar as I did, and search them. My inventive method paid off, as the site returned clear results when I tested via Google Webmaster Tools.
—- Begin Update —-
This has been my most popular blog post, and in honor of its 12-year anniversary, below is an implementation of the sliding window encoder, to allow you to experiment with base64 encoded search on your own.
Sliding Window Encoder
—- End Update —-