The term big data is making a splash in every sector and is without a doubt high on the telematics agenda. Our previous blogs and eBooks have looked at the analytical techniques applied and the way that third party telematics specialists, like MyDrive, harness the data. However, a key part of harnessing the data is managing it.
No longer do data analysts rely on huge physical servers; the rise of the Cloud has created unlimited storage availability regardless of location. This proliferation of technology comes with its own challenges and we have asked our Director of Software Engineering to provide his view on the challenges faced today:
Historically, the way IT departments minimised the risk associated with critical systems failure was to spend more money on ‘enterprise’ hardware. However, with today’s data volumes and an increased number of servers being operated by fewer engineers, this is no longer cost effective. The longer mean-time between failures of a more expensive system no longer outweighs the likelihood of failure when running hundreds or even thousands of servers.
Instead businesses architect systems to be robust in the event of failure. With growing maturity, businesses now have the confidence to treat virtualised cloud servers as transient systems that can be created and destroyed dynamically in response to load, customer demand, or problems. For startups in particular, this is a highly flexible model that is both cost and resource effective.
A DevOps culture is becoming increasingly common, consisting of cross-functional development and operations teams, particularly in lean startups, the benefits of developers involved all the way through to deployment and operations staff able to work on new features are compelling.
Those businesses that embrace automated server configuration management, implementing tools such as Puppet or Chef, ensure servers and systems are reproducible with identical configuration time after time. This reduces manual configuration errors as well as completely auditing any changes made and logging the history. Furthermore, provisioning additional or replacement servers requires only one button press and a new server can be processing data within the cluster in minutes.
In a heterogeneousvirtualisedenvironment, not all servers are created equal. Some have more computing power ormemory and some have higher network latencies. By using a queue-driven asynchronous architecture, each distinct data processing stage can be triggered by a virtualticketon a job queue. Automated monitoring and alerting detects situations such as there being more jobs than workers available or any unexpected failures, and triggers a suitable response.
With traditional relational databases, scaling typically means first vertical scaling e.g. bigger servers with more disk and memory. When that stops being feasible, then sharding – partitioning data between multiple servers – and either only querying part of the data or combining queries client-side becomes the most common route taken by data analysts to get the maximum output from their database. The alternative is for businesses to architect the system so that it doesn’t rely on the characteristics outlined above, and instead to use a distributed horizontally-scalable data store to dynamically increase capacity and performance.
These technologieand techniques allow businesses, irrespective of size, to achieve zero-downtime deployments. A single command distributes the latest code to all the servers for that application and updates them in a rolling deployment, so there is never any loss of service. Businesses can deploy code several times in a day, secure in the knowledge that customers will get the latest tested, quality assured software processing their data, without even a momentary loss in service.