I've spent the past little while playing more with the Cassandra database or key store. I've learned a bunch of new things.
I've got Cassandra setup on two dedicated servers and a third server to act as the web server. After a little bit of initial configuration issues, I've got both servers up and running in their own cluster.
Want the simplest piece of advice to get it up and running? Read the README. Best guide that there is.
It's interesting, from what I've seen so far, there are no user accounts to be created for accessing the database. This made me wonder initially how to secure the database from other people accessing it and manipulating the data maliciously. But I believe I figured out how to do this.
The nice thing, is you can setup the interface that Cassandra binds on. This pretty much leaves two different methods for securing Cassandra:
1. Setup the web server to run on the same box as Cassandra. This way, you can tell Cassandra to bind to localhost or 127.0.0.1, and it would therefore ignore all other requests.
2. Setup Cassandra to be bound to it's actual network address, but have the box Cassandra (and the cluster) is on in a location on the network that is not accessible on the outside. This way, the web server on the inside can hit Cassandra, but nothing from the outside will be able to.
More to come, but I'm interested in seeing where this goes so far.
Wednesday, March 31, 2010
Thursday, March 18, 2010
Cassandra Continued
So, I've delved more into using Cassandra as a database and it's still been interesting in learning about how it works.
Unfortunately, there really isn't much in the way of a user community who can help out when there are any issues. Additionally, there is not a lot in the way of documentation for setting up Cassandra, using thrift, or anything like that.
Right now, Cassandra is setup in a VM with 1 gig of ram and 1 processor dedicated to it. I've tested inserting data into the datastore. We ran a loop that inserted 10,000 records into the database, and it took about 1 second to insert all the records.
The record included a key, username, timestamp, age, and another timestamp. All records were inserted with a while loop in php. Now, the issues seems to be with reading the data out. We've been attempting to make a php page that will read a specified number of records and print it out to the web page. At the moment, the read is failing at reading anything above 2000 records.
I've talked it over with another individual that is working on this project, and we believe it might be with the "hardware" given to the virtual machine. At the moment, it only has 1 gig of ram available to it, and after reading, it seems that 1 gig is the minimum amount of ram that is recommended when using Cassandra.
Unfortunately, I am rebuilding my raid array at the moment, and will not be able to test out running the VM on my personal computer. Once the array is done being rebuilt, I can start testing it out with higher amounts of memory, and any other resources, and can see if I can get the read going with that.
Looks like it might be a project for the weekend. If anyone else knows about Cassandra, or have tried setting it up, definitely hit me up with how you have your rig configured. I'd be interested in it.
Unfortunately, there really isn't much in the way of a user community who can help out when there are any issues. Additionally, there is not a lot in the way of documentation for setting up Cassandra, using thrift, or anything like that.
Right now, Cassandra is setup in a VM with 1 gig of ram and 1 processor dedicated to it. I've tested inserting data into the datastore. We ran a loop that inserted 10,000 records into the database, and it took about 1 second to insert all the records.
The record included a key, username, timestamp, age, and another timestamp. All records were inserted with a while loop in php. Now, the issues seems to be with reading the data out. We've been attempting to make a php page that will read a specified number of records and print it out to the web page. At the moment, the read is failing at reading anything above 2000 records.
I've talked it over with another individual that is working on this project, and we believe it might be with the "hardware" given to the virtual machine. At the moment, it only has 1 gig of ram available to it, and after reading, it seems that 1 gig is the minimum amount of ram that is recommended when using Cassandra.
Unfortunately, I am rebuilding my raid array at the moment, and will not be able to test out running the VM on my personal computer. Once the array is done being rebuilt, I can start testing it out with higher amounts of memory, and any other resources, and can see if I can get the read going with that.
Looks like it might be a project for the weekend. If anyone else knows about Cassandra, or have tried setting it up, definitely hit me up with how you have your rig configured. I'd be interested in it.
Saturday, March 13, 2010
Cassandra...
Well, it was an interesting day yesterday.
Nearly all of the day was spent trying to setup an environment in a virtual machine to use/mess around with the Cassandra Database. It's an interesting concept that I'm having almost a little bit of trouble trying to get my head around in learning how to use it.
The devs of the Cassandra database don't even really want it to be called a database. It is supposed to basically be a flat file and it is also not a relational database. It was a ain to setup yesterday.
The documentation is not the best. In order to use the Cassandra database, you basically need to install Thrift to act as the API that connects to the db. I worked with a developer to install the Thrift API first, and then we went about trying to install the database.
After bringing everything over and starting up the database, it would run in the background fine. However, I could not get Thrift to connect to the db and provide a CLI to mess around with the db. It kept having java connection refused issues when trying to connect.
Googling the issue said to change the location that the db is listening from "localhost" to the IP address or fully qualified server domain name. Making that change, unfortunately did not fix the connection issue.
We ended up finding a guide for making php interact with the db. After following that guide, we were finally able to insert some dummy data and actually get the php page to pull and display information from the database.
One interesting thing of note was that every column has a name, value, but you also have to give it a timestamp. That was one of the most interesting things I saw when beginning to learn how it works.
Anyways, more coming soon.
Nearly all of the day was spent trying to setup an environment in a virtual machine to use/mess around with the Cassandra Database. It's an interesting concept that I'm having almost a little bit of trouble trying to get my head around in learning how to use it.
The devs of the Cassandra database don't even really want it to be called a database. It is supposed to basically be a flat file and it is also not a relational database. It was a ain to setup yesterday.
The documentation is not the best. In order to use the Cassandra database, you basically need to install Thrift to act as the API that connects to the db. I worked with a developer to install the Thrift API first, and then we went about trying to install the database.
After bringing everything over and starting up the database, it would run in the background fine. However, I could not get Thrift to connect to the db and provide a CLI to mess around with the db. It kept having java connection refused issues when trying to connect.
Googling the issue said to change the location that the db is listening from "localhost" to the IP address or fully qualified server domain name. Making that change, unfortunately did not fix the connection issue.
We ended up finding a guide for making php interact with the db. After following that guide, we were finally able to insert some dummy data and actually get the php page to pull and display information from the database.
One interesting thing of note was that every column has a name, value, but you also have to give it a timestamp. That was one of the most interesting things I saw when beginning to learn how it works.
Anyways, more coming soon.
Wednesday, March 10, 2010
URL Short
To take a quick break from tracker talk,
The past few days I've been working on trying to develop a web application to shorten URLS.
Yes, I know tinyurl already exists, and I also have heard of the other url shortening app that's been released.
I took this project on because I want to learn php and basically just sharpen my development and logical thinking skills. It looks like This will just be a couple of php pages with a database in the back storing everything.
It's been interesting going from ASP to PHP. To be honest, things just seem to be more intuitive using PHP. The language seems just easier to use and more logical. For example, connection strings to databases in ASP can be long, big, and most people have to take some time to look at in order to get how it is put together. PHP? It's a simple function with the database host, username, and pass. That's it.
So far, I'm liking this.
Any good designers want to come up with a sweet UI for the Admin Interface?
The past few days I've been working on trying to develop a web application to shorten URLS.
Yes, I know tinyurl already exists, and I also have heard of the other url shortening app that's been released.
I took this project on because I want to learn php and basically just sharpen my development and logical thinking skills. It looks like This will just be a couple of php pages with a database in the back storing everything.
It's been interesting going from ASP to PHP. To be honest, things just seem to be more intuitive using PHP. The language seems just easier to use and more logical. For example, connection strings to databases in ASP can be long, big, and most people have to take some time to look at in order to get how it is put together. PHP? It's a simple function with the database host, username, and pass. That's it.
So far, I'm liking this.
Any good designers want to come up with a sweet UI for the Admin Interface?
Thursday, March 4, 2010
Choosing the OS
In most cases, I think you are going to find that you will be running a tracker on a Linux box versus a Windows box. Most likely it is cheaper, and will be the option that you'll end up picking.
The only thing that comes with this, is there can be a learning curve. We had the option of picking either a windows box, or between various linux distros. In the end, we chose Ubuntu. I had some experience with using Ubuntu, and it is known as being a very friendly Linux OS.
Ubuntu also has just tons of documentation, guides, tutorials, and forums that you can read up on. Going into this, I only had experience with messing around on an Ubuntu machine, but only the desktop version. I've never used the server version.
The biggest "shock" to me was, "Wow. It really is only command line". At first, this was the hardest part for me. I felt more comfortable doing some of the finer tweaking of the server when using a gui, and I didn't have that option. However, it really was a blessing in disguise.
All I have to say is just give it a shot. When you want to learn how to install an application, look it up. How to configure a service? Look it up. In the end, it really helped me more fully understand the Operating System, and make me feel much more comfortable operating everything from the command line. When I had to do things that I didn't know how to do (which was nearly everything) I just looked up and followed a guide.
While it might be a slow start, you'll gain some very valuable knowledge doing it this way. Unless you are pretty knowledgeable with another distro, I'd recommend using Ubuntu as your main OS.
The only thing that comes with this, is there can be a learning curve. We had the option of picking either a windows box, or between various linux distros. In the end, we chose Ubuntu. I had some experience with using Ubuntu, and it is known as being a very friendly Linux OS.
Ubuntu also has just tons of documentation, guides, tutorials, and forums that you can read up on. Going into this, I only had experience with messing around on an Ubuntu machine, but only the desktop version. I've never used the server version.
The biggest "shock" to me was, "Wow. It really is only command line". At first, this was the hardest part for me. I felt more comfortable doing some of the finer tweaking of the server when using a gui, and I didn't have that option. However, it really was a blessing in disguise.
All I have to say is just give it a shot. When you want to learn how to install an application, look it up. How to configure a service? Look it up. In the end, it really helped me more fully understand the Operating System, and make me feel much more comfortable operating everything from the command line. When I had to do things that I didn't know how to do (which was nearly everything) I just looked up and followed a guide.
While it might be a slow start, you'll gain some very valuable knowledge doing it this way. Unless you are pretty knowledgeable with another distro, I'd recommend using Ubuntu as your main OS.
Tuesday, March 2, 2010
Hosting
One of the most important things that you will need when starting off with your own tracker, is finding the right host. If you pick the wrong one/a bad one, your going to be having some major headaches later.
The big thing you are probably going to want (at least) is a VPS (virtual private server). Not the same as having a dedicated server, but really it is the bare minimum. I would highly recommend against using just a web host. Any good web host will soon be able to see what it is that you are actually running on it, and you can expect them not to be happy about it. Also, some plain web hosts have automated scripts that will auto slow down processes or anything similar to that that you account might be using. This will directly impact the performance of your tracker.
Trust me, we know, we tried. In the end, I would recommend a VPS as a starting point.
I would also recommend finding a good backup host. Just start doing your research, and find where you think another place your tracker could be hosted. In case your current host ever pulls the plug, you can try to minimize the amount of downtime that you experience.
Some basic things to think about as well is, how will you pay the host? Personally, I try to give as little personally identifiable information as possible. I'll buy a pre-paid credit card and use that to make payment to the hosting company/person. Another worthy question, is if you receive a takedown notice, is the host immediately going to take you offline? Will they let you respond? You have to try to feel out these sort of things and make the best decision for yourself.
The big thing you are probably going to want (at least) is a VPS (virtual private server). Not the same as having a dedicated server, but really it is the bare minimum. I would highly recommend against using just a web host. Any good web host will soon be able to see what it is that you are actually running on it, and you can expect them not to be happy about it. Also, some plain web hosts have automated scripts that will auto slow down processes or anything similar to that that you account might be using. This will directly impact the performance of your tracker.
Trust me, we know, we tried. In the end, I would recommend a VPS as a starting point.
I would also recommend finding a good backup host. Just start doing your research, and find where you think another place your tracker could be hosted. In case your current host ever pulls the plug, you can try to minimize the amount of downtime that you experience.
Some basic things to think about as well is, how will you pay the host? Personally, I try to give as little personally identifiable information as possible. I'll buy a pre-paid credit card and use that to make payment to the hosting company/person. Another worthy question, is if you receive a takedown notice, is the host immediately going to take you offline? Will they let you respond? You have to try to feel out these sort of things and make the best decision for yourself.
Wednesday, February 24, 2010
Experiences with a Tracker
After doing a little bit of thinking on what might be a good idea, I've decided to offer a little bit of information on what it takes to run a bit torrent tracker.
This will probably take place within a few other different posts that might be completely different in content, but these posts will come. Maybe other people who are interested in learning about this will find some good information from my thoughts on this and can do something better.
I had found a similar guide/insights from a different tracker Sysop. The thoughts of that individual helped provide some insight into good ideas and bad ideas while running this.
Hopefully, you'll be able to pick up a thing or to. As they come, enjoy.
This will probably take place within a few other different posts that might be completely different in content, but these posts will come. Maybe other people who are interested in learning about this will find some good information from my thoughts on this and can do something better.
I had found a similar guide/insights from a different tracker Sysop. The thoughts of that individual helped provide some insight into good ideas and bad ideas while running this.
Hopefully, you'll be able to pick up a thing or to. As they come, enjoy.
Subscribe to:
Posts (Atom)