Sans aucune traduction ni adaptation je vous livre brut de fonderie des échanges que j'avais eu en novembre quand j'avais eu cette UT (pas eu d'autres depuis) en PM avec un fort sympathique teuton qui m'avait expliqué sa technique de backup / contrôle / restore d'une UT RNA.
Je n'ai pas eu l'occasion d'expérimenter tout ça et je ne sais pas si j'aurai le courage de le faire le jour où j'en aurai une...
------------------------------------------
I am hardcore. So ... the following steps, are going to be hardcore. Totally not recommended, unless you are an advanced user that knows exactly what you're doing, and am familiar with every instructions. It's also a fun way to get yourself more familiar with what's going on, behind the scenes with how BOINC manages the VMs, and also inside the VMs.
What I typically do is the following --- it's a bit dangerous, but if you follow the steps exactly, it shouldn't make your task fail. It is risky, though. Use lots of caution, if you decide to attempt!
----- Gracefully suspend and exit BOINC
- Suspend BOINC
- Monitor Task Manager, for all CPU activity to stop
- Exit BOINC
- Monitor Task Manager, for all VBoxHeadless* and VBoxWrapper* processes to gracefully exit
- Monitor Task Manager, for boinc.exe to gracefully exit
- In Task Manager, find VBoxSVC.exe, and use "End Task" on it
----- Optionally backup your Data folder
- Find your BOINC data folder
- Save a copy of it somewhere else, just in case. I do this every time I'm testing an OS upgrade or a VirtualBox upgrade.
----- Clone the VM
- Launch "Oracle VM VirtualBox". Note: I NEVER manually launch this, while BOINC is running. I make sure BOINC is all closed, before manually running this.
- If BOINC is NOT installed as a service, you should already see the machine in the list
- If BOINC is installed as service, use "Machine -> Add" to add the machine using the slot's .vbox file.
- Right-click the machine, and use Clone. Make sure to use "Full Clone" and "Current machine state".
- I typically have 3-4 weekly backups, so I append a number. My clones usually end with a name like "Clone 52".
- If BOINC is installed as a service, right-click the machine you added, and choose "Remove", and then "Remove Only". It's VERY important that you DO NOT choose "Delete all files"!
----- Inspect the clone
- Select the clone
- On the clone's snapshots tab, Right-click the top node, and choose "Restore snapshot", making sure to uncheck "Create a snapshot of the current state" before clicking Restore.
- Now right-click the clone, and choose Start. If you get a submenu next to Start, choose "Normal Start".
- In the new VM window, if it's all black, hit CTRL to "wake" the VM's display.
- Inspect the display for correctness. Nothing should look abnormal or look like an error.
- Hit CTRL+C, which will leave the cmsearch task running, but give you a command prompt.
- Type "top" in all lower-case, and hit enter. This launches the linux task manager.
- "cmsearch" should be near the top of the list. Make a note of it's "TIME+" column. This is the number of hours it has logged.
- Close the clone's window, and choose "Power off the machine", making sure to uncheck "Restore current snapshot".
- Close "Oracle VM VirtualBox", and monitor Task Manager, for VBoxSVC.exe to gracefully exit.
----- Determine if the VM is making progress
- Usually you can look at the progress.txt file in the slot folder, but if it's already 98.765%, then you can't rely on that.
- If you're worried that you may have screwed something up, make sure to go OFFLINE (by disabling all network adapters), before launching BOINC. This is so that the projects can't be notified of any task failures!
- Launch BOINC
- Resume BOINC, and let the tasks run for a couple hours.
- Repeat all of these steps, from the beginning (starting with Suspend BOINC), in order to get a 2nd clone to compare against.
- If all goes well, your second clone should show, in "top", that it has logged more time, than the first clone.
----- BOINC Data Recovery (if needed)
- If you had a problem while you were offline, and have a backup data folder, you can use it to recover.
- While BOINC is closed, move your current data folder to somewhere else
- Copy your backup data folder, to be the same folder location that you usually use for your data folder
- Test BOINC again, while offline.
- Keep testing until things look great.
- Then, while BOINC is closed, you can re-enable all your network adapters, then restart BOINC.
-----------------
JeromeC wrote:
- cloning the VM : excellent, I didn't know this could be done that way
Yes. Again, just be absolutely sure that BOINC is closed, the BOINC processes gracefully exit, then you kill VBoxSvc.exe, before you launch VirtualBox VM Manager to manually play with or clone VMs.
JeromeC wrote:
- looking at the console status and the slot text file : interesting, but this can be done on the actual running task, you are not obliged to see this in the clone, right ?
In BOINC, I only use "Show Console" to get the VM's forecast estimate or to see that it still looks okay without errors. I only ever hit "Ctrl" to wake the VM, and I never type anything else. And that's what I recommend - Never typing anything into that "Show Console" window!
To monitor status, you can look at the BOINC Progress %, which in most cases matches the progress.txt file, but... Christian coded things a bit poorly, such that, it can get to 100%, then fall back to 98.765%, and not get any new values into progress.txt. So, when that happens, usually the task is still healthy, and still updating the timestamp of progress.txt, but you have to "investigate using top within a clone" if you want to see if the VM is still logging useful time with the cmsearch application.
JeromeC wrote:
"- If you're worried that you may have screwed something up, make sure to go OFFLINE (by disabling all network adapters), before launching BOINC. This is so that the projects can't be notified of any task failures!"
What do you mean, if the task hasn't crashed yet but you suspect it could ? because there are some stuff looking like errors in the console ? anyway if the task has crashed you agree it's already too late ? (most probably already notified to the project)
I do some pretty crazy things while running RNA World tasks, like ... testing newer versions of Windows 10 Insider Preview builds (I'm on the "Fast Ring", but VirtualBox has problems quite often there), and testing newer versions of Oracle VirtualBox itself.
So ... For instance, if I get a new Windows 10 Build, it's standard practice for me to suspend BOINC, shutdown BOINC, install new Windows 10 build, BOINC will start up suspended, close BOINC, disconnect the network, make a backup copy of the BOINC folder, start BOINC, verify if tasks will run successfully .... and if tasks fail, I can close BOINC, restore my data folder, keep the RNA World tasks suspended, reconnect network, start BOINC.
In fact, when VirtualBox has problems with new Windows 10 Insider Builds (like 14971, released 2 days ago) ... I actually have a separate "Windows 10 Release" partition I can use, and the data folder is on a partition that they share. So, tonight for instance, I'll: uninstall BOINC from Insider partition, reboot into Release partition, install BOINC in Release partition, and start crunching RNA World tasks again from that partition.
JeromeC wrote:
- restore : your process would restore all projects backup in the data directory, since i run several projects all the time, so I couldn't do that... I'd need to be able to restore only RNA task... (meaning not only RNA data directory but being able to edit and modify various xms boinc status files... I did that in the past but it was very complicated and risky)
The backup process is applicable for the ENTIRE data folder. You are basically: going offline, then taking a snapshot of that entire folder, then manipulating it any way you think you need to, testing it, restoring if if you screw it up, retesting it, then if you're confident it's correct, going online.
I am a BOINC Alpha tester, and I too am connected to many projects -- in fact, I'm attached to ~60, and routinely do work for ~15. What you are not understanding is: When you go offline, then take a backup of the data folder, then ... anything you do to that folder (any work, any file modifications, any task failures, anything) is NOT communicated to the projects. So, if I had gone offline, and made a copy of my data folder, I could for instance, start BOINC, abort all the tasks for my ~15 projects. And then realize "oh that's not what I wanted", then close BOINC, restore the data folder, restart BOINC, verify the tasks are working again, close BOINC, enable the network adapters, then start BOINC. .... Basically, as soon as BOINC is running with the network enabled, you "invalidate" your offline backups, so you better be very sure things are exactly how you want them, before you allow BOINC to talk to the projects
