Installing a Cassandra 2.1 cluster on Ubuntu Trusty

Installing a Cassandra 2.1 cluster on Ubuntu Trusty

Assuming you’ve already went through /devops/initial-server-setup-trusty/

Prerequisites

Java (Oracle 1.8)

For Cassandra, AFAIK we need Oracle’s JRE, as many things wouldn’t work if we used OpenJDK.

Luckily, the process is pretty straightforward. Refer to: /devops/install-java8-on-trusty/

JNA

Cassandra also requires JNA. Install it with the following command:

sudo apt-get install libjna-java -y

Installation

To figure out how to install Cassandra 2.1 I was looking at following docs, which are extremely complete: http://docs.datastax.com/en/cassandra/2.1/cassandra/install/installDeb_t.html

Installation is pretty easy:

echo "deb http://debian.datastax.com/community stable main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list # add the DataStax Community repository
curl -L http://debian.datastax.com/debian/repo_key | sudo apt-key add - # add the DataStax repository key to your aptitude trusted keys
sudo apt-get update

Then:

sudo apt-get install dsc21 -y

Configuration

Configuration is more time-consuming.

I’m using the instructions on this page as a starting point: http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configTOC.html.

Reset config

Stop Cassandra and remove the default configuration:

sudo service cassandra stop
sudo rm -rfv /var/lib/cassandra/*

Timezone

Before using Cassandra, double-check that you have the correct time set on all nodes.

Cassandra uses timestamps to write columns, and it wouldn’t be good if nodes are set to different timezones:

date

If needed, just run this and pick whatever timezone, and set the same on all nodes (I personally use UTC):

sudo dpkg-reconfigure tzdata

To keep it synchronized, you can use NTP—that should be enough:

sudo apt-get install ntp ntpdate -y

Make sure the service is running. If you’re in a VM, you might have to do this on the host OS.

Main settings (cassandra.yaml)

This is the main configuration file.

I’m using the instructions on this page as a starting point: http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html.

To get started:

sudo nano /etc/cassandra/cassandra.yaml

Secure installation

Out of the box, Cassandra is set to disable authentication (i.g. lets everyone in), and give users full access to the system:

authenticator: AllowAllAuthenticator

Let’s change it to:

authenticator: PasswordAuthenticator

You also have this, which you might want to set to CassandraAuthorizer (see http://docs.datastax.com/en/cassandra/2.1/cassandra/security/secure_config_native_authorize_t.html):

authorizer: AllowAllAuthorizer

Basic settings

Here are some basic settings, as explained on datastax.

cluster_name

This setting prevents nodes in one logical cluster from joining another. All nodes in a cluster must have the same value in all datacenters.

cluster_name: 'CASS-1'
listen_address

This is the IP address of your machines, which is used by Cassandra for listening to other Cassandra nodes.

For a single-node configuration, you can put localhost or leave empty.

For a cluster config, if your node is configured properly (host name, name resolution, etc.) you can leave this empty, and Cassandra will use Java’s InetAddress.getLocalHost() to automatically get  the local IP address (you can probably check if it would work using ifconfig—if you see your public IP address there, it should).

There are cases where you can’t leave this empty. For instance, Java might be unable to figure out the address if you’re on a virtual machine.

To hardcode an address:

listen_address: 123.45.67.890
seed_provider

I specified which machines are seeds (in my case I have a 3-node cluster so I only have one seed, which is the first server I set up):

seed_provider:
# Addresses of hosts that are deemed contact points.
# Cassandra nodes use this list of hosts to find each other and learn
# the topology of the ring.  You must change this if you are running
# multiple nodes!
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
# seeds is actually a comma-delimited list of addresses.
# Ex: "<ip1>,<ip2>,<ip3>"
- seeds: "123.45.67.890"

So, in the block above we have to change the following line:

- seeds: "123.45.67.890"
Snitch

For multi-node production deployment, the recommended snitch seems to be GossipingPropertyFileSnitch:

endpoint_snitch: GossipingPropertyFileSnitch
rpc_address

For rpc_address, put the machine’s IP address if you want software on other servers to be able to talk to Cassandra (recommended I guess?):

rpc_address: 123.45.67.890

Datacenter/rack settings

Now, we need to set the datacenter/rack settings (see http://docs.datastax.com/en/cassandra/2.1/cassandra/initialize/initializeSingleDS.html for more info):

sudo nano /etc/cassandra/cassandra-rackdc.properties

And you’ll see something like:

dc=DC1
rack=RAC1

These settings are up to you, i.g. it doesn’t have to be an actual datacenter. You can pick whatever you want, but cannot change it so give it a good name :-)

Reboot

IMPORTANT: after making any changes in the cassandra.yaml file, you must restart the node for the changes to take effect:

sudo reboot

Start Cassandra

After rebooting, start Cassandra with:

sudo service cassandra start

Then, to check if it’s working run:

nodetool status

If is shows you a list of nodes (only 1 for now), everything is OK. Otherwise, something is wrong, read below.

IMPORTANT: create superuser

You will be able to connect to the database using username ‘cassandra’, password ‘cassandra’, but obviously we’ll want to change this.

Let’s login into Cassandra:

cqlsh -u cassandra -p cassandra

You should see something like this:

Connected to CASS-1 at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 2.1.5 | CQL spec 3.2.0 | Native protocol v3]
Use HELP for help.
cassandra@cqlsh>

Create your admin user:

CREATE USER myusername WITH PASSWORD 'mypassword' SUPERUSER;

Logout using the exit command, and log back in with your newly create superuser:

cqlsh -u myusername

Then, change the cassandra user’s password to something hard to guess, and remove its admin privileges:

ALTER USER cassandra WITH PASSWORD 'jo<3@90d9ewdjdfsdfjh' NOSUPERUSER;

Troubleshooting

Failed to connect to ‘127.0.0.1:7199

sudo nano /etc/cassandra/cassandra-env.sh

If you try things out and you’re getting a “Failed to connect to ‘127.0.0.1:7199′: Connection refused” error, you might have to change -Djava.rmi.server.hostname in cassandra-env.sh

Just uncomment and put your IP address/domain name where it says <public name> (towards the end of the file):

# add this if you're having trouble connecting:
JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=<public name>"

This error can also be caused by Cassandra shutting down because of a configuration error (see below)

Configuration errors/ other problems

If you run into problems, make sure Cassandra is running:

sudo service cassandra status

Many times Cassandra can also shut down without any helpful error. Running this can show you what Cassandra is complaining about:

cassandra -f -p /var/run/cassandra/cassandra.pid

“could not access pidfile for Cassandra”

I got this error a couple of times. The solution for me was to delete /run/cassandra/ and start it again.

Everything else

To see what might be wrong in your particular case, run this:

cassandra -f -p /var/run/cassandra/cassandra.pid

Rinse and repeat

Number of nodes

The cluster can be any number of machines. Some of them will be seeds, meaning that other machines will check with them to get info about the cluster. There should be more than one seed so that if that one goes down it’s not a big deal.

Ideally, I believe you want to have at least 6 nodes spread across 2 datacenters, 2 regular nodes and 1 seed per datacenter.

Configuration

You have to do all of the above for all the machines you want to use.

Double-check the timezone (see above)!

The seed would be the same,  since I’m using VMs, so I’ve taken a snapshot of what I have so far, and restored the other 2 blank machines from the snapshot. Then, ran the following to refresh settings/token, change IP-specific settings and rebooted:

sudo rm -rfv /var/lib/cassandra/data/system/*

IP-specific settings are:

  • listen_address in cassandra.yaml
  • hostname in cassandra-env.sh (unless you’re setting up a seed node, then you have to add its address to “seeds” on all nodes)

Also, when working in different datacenters, we should double-check datacenter/rack settings:

sudo nano /etc/cassandra/cassandra-rackdc.properties

Possible problems

Cannot achieve consistency level LOCAL_ONE

Although I created a cluster, the system_auth keyspace was created with a replication strategy SimpleStrategy with replication factor of 1.

This won’t work if you have multiple datacenters, as not all machines will have a replica. To fix this, we have to modify the keyspace.

Connect to a node via cqlsh, then run the following query:

alter KEYSPACE system_auth WITH replication = { 
'class': 'NetworkTopologyStrategy',
'dc1': '3',
'dc2': '3'
}; 
Of course, change “dc1” with your datacenter name (as it appears when running nodetool status) and “3” with the number of desired replicas/machines you have.
Then, in order for all nodes to get a copy of the system_auth keyspace, exit cqlsh and run:
nodetool repair system_auth

Rock & roll

Getting the cluster up

Start the seed(s), then the other machines, and run:

nodetool status

You should see all machines in the cluster.

It works!

To test, try to create records as explained at the bottom of this page: https://wiki.apache.org/cassandra/GettingStarted.

It worked for me, I added and removed records and they were immediately synchronized across all machines, so that I could operate on the DB as if it was one.

Pretty cool.

Further readings

http://www.tomas.cat/blog/en/cassandra-frequent-mistakes/

One thought on “Installing a Cassandra 2.1 cluster on Ubuntu Trusty

Leave a Reply

Your email address will not be published. Required fields are marked *