Nick Pasko

Programmer goes enterpreneurship.

Friday, April 24, 2015

Clone existing (stopped) Hive cluster

1. Find a stopped cluster you want to be cloned and Clone it
1.1. Provide as many machines as you think should be minimal: it's easy to add machines later, harder to remove them,

2. Install s3cmd (SUSE):
cd /etc/yum.repos.d
sudo wget http://s3tools.org/repo/RHEL_6/s3tools.repo
sudo yum install s3cmd
2.1. create ~/s3cfg file, copy content from any EC2 machine you have S3 set up on
(or check here and do it from scratch)

3. Install aws-cli
sudo yum install aws-cli
3.1. create AWS credentials file
mkdir ~/.aws
cd ~/.aws
nano credentials
[default]
aws_access_key_id=Your Access Key ID
aws_secret_access_key=Your Secret Access Key
region=Optional, the default region to use for this profile

4. "s3cmd get" table creation scripts, run them, then run hive and you're good.

Thursday, April 16, 2015

Hive, S3 and AWS credentials

1. Open Hive cluster page in EMR, get SSH connection address

2. Log into main hive cluster machine using Putty

3. Install S3 (http://s3tools.org/repositories):
  3.1. cd /etc/yum.repos.d
  3.2. sudo wget http://s3tools.org/repo/RHEL_6/s3tools.repo
  3.4. sudo yum install s3cmd
4. configure s3cmd: sudo nano .s3cfg, copy from any EC2 machine
5. aws config
, provide AWS credentials

6. s3cmd now works