Curiosity is a key Developer skill
Good developers are curious, inventive people. They see a problem and they look in their toolbox and solve it. If there's a new tool they discover, they want to try it out. With Hadoop, a developer in a large enterprise is apt to hit a brick wall. Why? Generally speaking, large corporations have standard desktop configurations and strict policies that make it difficult to explore and experiment with the Hadoop ecosystem.
A large enterprise often has the following challenges:
- A developer's machine is locked down. The developer does not have admin rights. Installing software is usually done through an internal system where app access is requested from a catalog, an admin approves, and the software is automatically installed. So using a Sandbox VM may not be an option.
- A developer's machine's specs are limited. Often a developer will be given a desktop with the same specifications that non-developers are given. It is a bit ridiculous to think that a developer (who might need to spin up services and multiple clients when developing code) would have the same machine as an accountant who might just need to run Excel and a browser. One way developers play with Hadoop is by using a VM Sandbox (all the major distributions offer one) but developers often do not have enough RAM to support this. And they're often restricted from downloading the VM image.
- Access to an existing Hadoop cluster within the organization is not possible. Many large organizations have adopted Hadoop, but it is often done in a silo'd manner. E.g. the IT Infrastructure team has a cluster to help with analytics for streaming hardware/equipment logs, but ETL teams or the Marketing department do not have access.
Automated Provisioning of User Volumes
A large managed health care services provider has partially solved this catch-22 by providing MapR user volumes that can be automatically provisioned so that an individual can work on a Hadoop cluster. There's still bureaucratic red tape involved, however. The developer must request access and the department gets billed. Nonetheless, this is the best solution I've seen to date as it allows developers to leverage the power of a large cluster and experiment with real datasets. Big Data's real mojo is in the fact that it is a parallel compute platform (see Big Data, Big Misnomer), so the larger the cluster, the larger opportunity to leverage parallelism.
Allowing employees to use an external cloud (e.g. AWS) is often not an option due to security concerns. But if a private or public cloud option is available, it still may not suffice as Hadoop runs best on physical machines closely networked across racks, and the developer may still be required to setup the Hadoop services.
This is probably the easiest solution. With more RAM a developer can use a pre-built Hadoop Sandbox. A Sandbox VM requires 6+ GB of free RAM. Therefore, if developers can get more RAM they can at least play, learn and start to sharpen their Big Data skills. The major distributions offer their sandboxes as free downloads. Here are a few:
The truly curious developer will use the Sandbox VM on their home computer to learn. Unfortunately, this means they can't perform true experiments with the business and technical problems that they face at work.