Wiki Slowly Coming To Life

   12

Wiki Slowly Coming To Life

Note: This is a merging of two news posts. Anubis' news post is at the end of this one.

As all of you are undoubtedly aware, the Wiki Database as it was earlier this month is no more. To our knowledge there were no recent backups and after a number of attempts by various DC members and myself for a week and a half, we determined that we should operate under the assumption that we weren't getting the data back and resort to the laborious task of pulling a rabbit out of our...bummies? :D

After a lot of work, I am happy to report that the wiki fiasco resolution is moving along with measurable results, now.

I'm extremely optimistic at the results as we could be WAY worse off than what we actually are! I've managed to recover a metric crap-load of data!!!!

Using Google Cache then writing scripts to convert the cached documents back into wiki formatting and then shoving then into the archived database from December 28, 2009.

Don't get me wrong, it isn't all sunshine and roses. We will lose some data data. That is unavoidable due to the situation we're in, but I've recovered a lot more than I originally expected!

Finding Content

I've managed to recover a boatload of articles thanks to Google's Cached Pages feature and a few tools:

  • - Vidalia & Tor: used to proxy and spoof my traffic so I could fool Google's bot sniffers that block automated downloads.
  • - Torbutton: Firefox plugin to enable Tor support in the browser.
  • - DownThemAll!: Firefox plugin that lets me download the contents of links that match a specified filter.
  • - Google Site Searching: I searched for a boat load of terms and downloaded every cached page I could get my hands on.

I downloaded over 6,000+ pages and when all duplicates were removed I was able to safely say that we recovered 2,167 unique pages (a mixture of articles, talk pages, categories, and templates). Good deal. That was the easy brain numbing part.

Dumbing Things Down

Next I set to work on writing a PHP script from my local box to loop over all 6,000+ pages, scrape out the name and URL, and covert the bastards to wiki syntax text files.

I have most of the basic wiki formatting down (bold, italic, headings, paragraphs, tables, etc) and I'm working on DJB specific templates. I've just completed the Character template reverse-engineering and re-imported all salvaged character pages into the wiki. There's still a lot of reverse-engineering to do, so if you find yourself over on the wiki, don't be stunned by some ugly formatting.

For my fellow geeks that are interested, I'm using the following tools to make this happen:

  • - PHP (of course)
  • - PHPQuery: A port of jQuery (circa 2009) to PHP which makes page scraping a lot sexier.
  • - Peachy: A PHP API to interface with our wiki's APIs which allows me to automate the manipulation, creation, and deletion of pages on the wiki.
  • - Also some magic. And sunflower seeds.

What About Images?

The good news here is that the images aren't stored in the database, so we have all of the image files still on hand. I do, however, have to write some sort of script to re-import the files so the database knows the files are on the filesystem. Peachy has some image handling features that I'll look into soon, though I'll leave images until after the data is clean.

What's the Butchers Bill?

I can't answer that. I know it'll be higher than we would like and people will have lost work. My goal is to deliver code to minimize the overall impact on people's time, however, there will inevitablly be be problems with my cleanup scripts...things I missed or misinterpreted. A lot of people will need to spend a lot of hours fixing and replacing things and for all of you who will bear that task, the DJB thanks you in advance.

Another item that is gone are user accounts that were created between December 28, 2009 and today. If you are one of those people, you'll need to re-create your account. :(

In Closing

I'm sorry about the DJB being in this situation with the wiki and I'm largely at fault for not having a way to make backups. Once this is over, I'll be detailing how I aim to avoid this situation in the future. The database WILL disappear, become corrupted, etc at some point in the future (it is inevitable). Next time I want us to be ready for it.

In terms of an ETA: I estimate about 4 more days and late nights before I feel comfortable enough to unleash the wiki staff (and anyone who wants to help with cleanup) on the wiki.

Also: Feel free to check over the wiki. If you find an article or page that is missing, please let me know and I'll see if I can find a cached copy to convert and upload.

Anubis' Followup Article - Merged

Oh boy...

If you read Orv's news post below, you'll see just what state our wiki is going to be in. Before I move on to the next matter (before I get into the grimey details), I want to say - in advance - that I'm sorry. This weekend (approximately the same time that the DJBWiki comes back online) I will be completely absent. I am going out of town right after college on Friday, and won't be back until after college Monday. Expect to see me around Monday evening grinding the crap out of my fingers..

User Accounts

First and foremost, we need to sort out the user accounts. They are our top priority. I would like to ask everyone to hold off on making any edits for at least a week after the Wiki Staff gets it's hand on it so we can get as much stuff fixed as possible before everyone starts going "OMG! my stuff's gone!" and making a million edits each - but sadly, I can't make that request. Trust me - if I could, I would. Because it becomes a million times harder to sort through everything with everyone editing stuff.

I would like everyone with an account on the wiki to email [Log in to view e-mail addresses] exactly what you registered your username with, and what it was when the wiki went down. Over the next couple weeks, the Wiki Staff will be going through these emails and getting everything back to what it's supposed to be. If you created your account after December 2009, simply put in a request for a new one. We'll get them processed as we come to them. Be patient.

Wiki Staff

This is going to be a massive undertaking by the DJBWiki Staff and DJBWikipedian community. I'd like to ask that anyone with decent experience dealing with Wiki code/templates and feels they are up to the task email myself directly ([Log in to view e-mail addresses]). I need to be able to trust you. This is my baby we're talkin' about. If I don't think I can trust you, or don't think that you're up to the task, you won't be helping as a temporary staff member. Cruel, but it's that simple.

That said: For the current wiki staff, it's all hands on deck. I expect to see everyone pitching a hand to help out with this. While I'm away this weekend, I'll be able to check my email via my cell. How much? Not sure. Going into the bush, might lose cell service every once in a while, might lose it the entire time we're out there. Gotta love Canada's woodland.

Cleanup's going to take a while, and I hope that you'll all bear with us while we sort through everything and get it back up to snuff. A lot of stuff (including a metric ass-load of templates) is going to need to be rewritten. Orv's restoring everything he can, to make the impact as minute as possible. Don't be too hard on our little souls, now!

Praise the great Orv!

If your stuff is missing let him know (HERE) specifically which articles are gone, as google only displays the top 100 search results at a time, he will have a much easier time if he knows exactly what isn't there.

DOOOO EEEEET!

<3 (except Anubis) Ji

Thanks for keeping at it, Orv. You're, as always, kicking ass.

Ji let me know that the New Tython article was gone. With that info, I was able to do a search and salvage three more articles:

New Tython, Menat Ombo, & Ji

In looking around I've noticed some HSP Planets/Moons are missing: Barbatos, Vassago, Judas, Gressil, Marchosias.

Judas has been recovered. Barbatos, Vassago, Gressil, and Marchosias are, sadly, not recoverable :(

Oh, Adien Kolar was also recovered when I searched for New Tython.

I, too, would like my news post merged with this one. And my babies merged with Orv's genes. Thank you.

Thank you a million times over for all your hard work Orv. This cant have been easy for you and i am sure with out your work and others as well we might not have a wiki at all this point!

to all those that will be working more on this in the weeks to come good luck!

Here's to hoping... Was Kel Rasha and a recent version of the Aeotheran entries recovered?

Can we still pull rabbits out of our bummies?

Not sure if this matters, but I recently re-searched for certain pages and got more recent results -- the OFH page had no result at all before, but now it displays the most current (May 11) cache. Could it be possible google will continue updating with the may caches? Or was this just a fluke? (Same result with the House Dinaari page)

OFH Cache and House Dinaari Cache

Yeah, going off what Shadow said, I found a cached page of my character's bio from April 22nd, 2011, as you can see here:

http://tinyurl.com/4ygrd3m

You need to be logged in to post comments