Recently I had the privilege of being tasked with bringing Athabasca University's Learning Management System to AWS. This was predicated by Jennifer Schaeffer joining the AU team and presenting her vision of the future of Athabasca University which is a exciting leap forward for the University and a beneficial push into DevOps culture and modern infrastructure concepts.
The prior Moodle architecture was created as a monolithic two tier system. This had many drawbacks but fit the budgeting and knowledge of the University at the time. With a single physical host and a Postgres database server.
The current version of the AWS based AU Moodle environment looks like the following. Moodle is hosted on an AutoScaling group of EC2 instances behind a ELB. This connects to a Aurora RDS Postgres cluster containing the databases for each different Moodle Department. Moodle has been configured to store its session and application cache in a pair of elasticache memcached instances. And a gluster file system has been setup to store the course content and files for the Moodle classes. We have three separate environments configured for dev/test/prod. Communication with our internal systems is done via a locked down vpn connection.
Infrastructure as code
I extensively used CloudFormation templates to create the environments. The Infrastructure as Code methodology allows me to use a commented git repository giving us inline documentation and the ability to audit the changes applied to the infrastructure. In addition the use of CloudFormation to create the environment, allows me spin up new environments very quickly. We can have a fully functional Moodle environment (not including Data and VPN) running in under 10 minutes.
The base Moodle AMI images are created using DevOps methodologies. The image is baked with all the needed tools in a non configured format. Using Ansible I am able to bake a new AMI in a fully automated fashion. The image is created and the AMI_Id is placed in an SSM parameter that is configured in the CloudFormation templates. This way I can apply an upgrade to Moodle using the AutoScaling groups Rolling Update functionality. Upgrades are applied first through the dev/test environments and then pushed into production, simply by changing the SSM parameter. Upgrades can happen invisibly to the end user and in the case of a failure can be rolled back very quickly. This allows us to start performing very agile deployments.
Monitoring and automation
After the initial deployment I learned some quite important things about our Moodle deployment. A memory leak was identified and was worked around by having a Scheduled ASG scale up and scale down. I also identified an issue with the fuse file system used to connect the Moodle instances to the Gluster file systems. I was able to work around many of these issues by setting up CloudWatch log metrics and using them to trigger scaling events to kill off broken Moodle instances and to bring up new instances as needed. This increased visibility lets us react faster and in a more repeatable manner.
Centralized logging is a requirement to modern deployments like this, and without it they would be impossible. For troubleshooting I created an AWS ELK stack and used Logstash to stream ELB logs and the CloudWatch logs into it. This gave us the ability to tie the CloudWatch logs to the ELB logs, this is more important then it seems as currently ELB logs cannot be monitored in CloudWatch and instead need to be processed by Athena. I love the Athena ability to perform SQL queries against these logs but it is much easier to aggregate in ELK.
Backup and Replication
Data redundancy is achieved by having RDS and EBS snapshots replicated to a different AWS region. The cost of this replication is relatively low as the replication leverages delta based transfers to the other region. The base architecture is configured in this other region and we can very quickly (10 minutes) bring the entire LMS system back online removing the need to backup anything but the files and the database.
The future, it's bright!
Moodle needs some changes done to it to more completely support the cloud environment that it now lives in. We need to move all of the files in the GlusterFS to S3, saving us money and complexity. This would remove our need to replicate data across regions and allow us to setup a CDN/CloudFront placing our course files within ms's of our world wide learners. As well we would be well served to re-architect our Moodle LMS to be even less monolithic then it is to allow for better scaling and to move it into containers. These improvements can now be staged in, and do not require large down times to achieve.
I am very happy with what we have done and look forward to continuing to bring the AU systems to the cloud. I hope this was at least interesting if not educational. I am proud to have been a part of this and in talking to students they are extremely happy with the performance and functionality of the new environment.