Sunday, July 21, 2013

Removing Diacritical Marks

I recently found myself in the situation where I needed to remove the diacritical marks from a string. The poor man's way would have been to substitute the individual letters with their unadorned equivalents (é, è, ê => e; à, á, â => a; etc.) But I was looking for a more elegant solution.

A little googling found me this little gem in C#. The article is in Dutch, but the code speaks for itself:

string q = "Wûnseradiel"
char[] normalised = q.Normalize(NormalizationForm.FormD).ToCharArray(); 
q = new string(normalised.Where(c => (int) c <= 127).ToArray()); 
// q == "Wunseradiel" 

The issue is that in Unicode two characters that look the same can have different binary representations (look here for the full details.) The process of giving all equivalent characters the same representation is called normalization. In .NET the String class has a built in method to do just that. The form that is chosen is Normalisation Form D (NFD) which separates each 'accented' character into the unadorned character followed by the diacritical mark. This makes it easy to iterate over the resulting character array and skip all the diacritical marks, leaving us with just the unadorned characters. Elegant and concise. But what if I wanted to apply the same solution in Go?

The heart of the above solution is the ability to normalize a Unicode string as NFD. So I checked the standard libraries for normalization support for Unicode. At the time of writing such support is not included in the standard Go libraries. But it is available as a third party package at: https://code.google.com/p/go.text. The documentation is available here.

To make use of this package, simply cd into the ./src of your GOPATH directory and execute:

hg clone https://code.google.com/p/go.text/

Now the unicode/norm package is available to be used in your own project. The following code illustrates how to achieve the same effect as with the C# snippet above, but fleshed out into an executable project:

package main

import (
 "code.google.com/p/go.text/unicode/norm"
 "fmt"
)

func StripDiacritics(value string) string {
 normalized_value := norm.NFD.String(value)
 var buffer []rune
 for _, char := range normalized_value {
  if char < 128 {
   buffer = append(buffer, char)
  }
 }
 return string(buffer)
}

func main() {
 message := "Buén día, mundo!"
 fmt.Println("message: ", message)
 fmt.Println("stripped: ", StripDiacritics(message))
}

The output:
message: Buén día, mundo!
stripped: Buen dia, mundo!

Sunday, July 14, 2013

Go Proxy

For a personal project I'm currently doing I needed to write a proxy server for some audio I had hosted on my server.

Authentication is needed to access the audio on the server. But for a variety of reasons I wanted to grant access to some of the audio without exposing the credentials I use to authenticate to the server. So I figured if I wrote a little Internet facing server to process the requests for audio I could then relay them to my audio server with the appropriate credentials added.

As I am exploring Go (golang.org), I decided I would try my hand at writing what I needed using that. It took some trial, error and exploring the documentation. (If I had done it the other way around, there would have been way less error in the trials, but oh well...)

I wanted to make the audio available for streaming. So I was already dreading having to implement some sort of buffering and what not. But I was pleasantly surprised. Turns out that the part that does the heavy lifting is really quite straight forward thanks to the Go's excellent standard library of packages. All it takes is the code below:
func streamUrl(w http.ResponseWriter, url string) {
 response, _ := http.Get(url)
 defer response.Body.Close()
 reader := io.TeeReader(response.Body, w)
 _, err := ioutil.ReadAll(reader)
 if err != nil {
  log.Println("Error: ", err.Error())
 }
}
This function is passed an http.ResponseWriter stuct and a url. The http.Get() function is invoked which returns a pointer to an http.Response struct the Body of which implements the io.Reader interface. The magic happens thanks to the io.TeeReader function, which takes an io.Reader and an io.Writer, and returns an io.Reader. The documentation states:
TeeReader returns a Reader that writes to w what it reads from r. All reads from r performed through it are matched with corresponding writes to w. There is no internal buffering - the write must complete before the read completes. Any error encountered while writing is reported as a read error. (http://golang.org/pkg/io/#TeeReader)
Which is exactly what I was looking for!

Edit: Thanks to +Michael Gebetsroither for pointing out that io.Copy does the job just as well, while providing cleaner code too!
func streamUrl(w http.ResponseWriter, url string) {
 response, _ := http.Get(url)
 defer response.Body.Close()
 _, err := io.Copy(w, response.Body)
 if err != nil {
  log.Println("Error: ", err.Error())
 }
} 

Thursday, July 11, 2013

Mundo, Con Ta!

This is the place where I'll share my experiences and observations on languages, coding and computing. I might digress now and then into other areas that capture my interest from time to time.

The opinions expressed here are my own of course, and where they align with the opinions of others, published or unpublished it just shows that great minds think alike...

By the way, the title of this post is "Hello, World" in Papiamento, my native language.