More

zigzigzag · on June 8, 2017

Are you sure? I thought Algolia literally was Solr (which is ElasticSearch which is Lucene).

BjoernKW · on June 8, 2017

Yes, I'm sure. I talked to one of the founders about 2 years ago. Algolia is written in C / C++ from the ground up.

pcsanwald · on June 8, 2017

your comment is a bit confusing: both Solr and ElasticSearch are built on top of Lucene.

Algolia is _not_ built on top of lucene at all.

zigzigzag · on June 8, 2017

That seems like a really unfortunate design decision. I used to think that Java's use of UTF-16 for strings was just a problematic legacy thing, but compared to this it seems quite good. Strings are pretty high performance and there are no complex calculations to do indexing or bounds checks. And in Java 9 the JVM can switch between UTF-16 or Latin1 encodings on the fly, which both uses less RAM and speeds things up simultaneously. There are no memory safety issues caused by character encodings.

burntsushi · on June 8, 2017

I think there might be a few misconceptions in your comment, but it's hard to say for sure. This article isn't written in a style that I particularly enjoy, but it's not too long and contains some useful facts that might help dispel those misconceptions: http://utf8everywhere.org

I do have my unrelated niche complaints about Rust's string story (and I have vague plans to resolve them), but Rust's string implementation is my favorite among any other language I've used.

kibwen · on June 8, 2017

What seems like an unfortunate design decision? I'm unclear what concerns you're seeing that would suggest UTF-16 as preferable to UTF-8, either in terms of performance or memory safety.

zigzigzag · on June 8, 2017

To not only use UTF-8 as the internal string encoding but practically mandate it, if you want to remain safe.

UTF-8 is a fine transport format, but for raw runtime performance it's obviously going to be an issue if you ever need to iterate over characters, do substring matches, things like that because you can't do constant time "next char" or indexing.

UTF-16 doesn't let you do that either in the presence of combining characters, but they're pretty rare and for many operations it doesn't really matter.

burntsushi · on June 8, 2017

I feel like your comment contains a lot of misunderstanding about UTF-8. For example, UTF-8 is self-synchronizing, which means you can indeed find the "next char" in constant time.

UTF-8 is certainly not a problem for runtime performance. Substring search, for example, is as straight-forward as you might imagine. You have a needle in UTF-8 and a haystack in UTF-8, and a straight-forward application of `memmem` will work just fine (for example). In fact, UTF-8 works out great for performance , because it's very simple to apply fast routines like `memchr`. e.g., If you `memchr` for `a`, then because of UTF-8's self-synchronizing property, any and all matches for `a` actually correspond to the codepoint U+0061.

Indexing works fine so long as your indices are byte offsets at valid UTF-8 boundaries. Byte offset indexing tends to be useful for mechanical transformations on a string. For example, if you know your `substring` starts as position `i` in `mystr`, then `&mystr[i + substring.len()..]` gives you the slice of `mystr` immediately following your substring in constant time. When all your APIs deal in byte offsets, this turns out to be a perfectly natural thing to do.

Generally speaking, indexing by Unicode codepoint isn't an operation you want to do, because it tends to betray the problem you're trying to solve. For example, if you wanted to display a trimmed string to an end user by "selecting the first 9 characters," then selecting the first 9 codepoints would result in bad things in some circumstances, and it's not just limited to the presence of combining characters. For example, UTF-16 encodes codepoints outside the basic multilingual plane using surrogate pairs, where a surrogate pair consists of two surrogate codepoints that combine to form a single Unicode scalar value (i.e., a non-surrogate codepoint). So if you do the "obvious" thing with UTF-16, you'll wind up with bad results in not-exactly-corner cases.

It's worth noting that Rust isn't alone in this. Go represents strings similarly and it also works remarkably well. (The only difference between Go and Rust is that Rust's string type is guaranteed to contain valid UTF-8 where as Go's string type is conventionally UTF-8.) Notably, you won't find "character indexing" anywhere in Go's standard library or various Unicode support libraries. :-)

I would very strongly urge you to read my link in my previous comment to you. I think it would help clarify a lot of misconceptions.

kccqzy · on June 8, 2017

This has been repeated so many times but UTF-16 does not allow constant time indexing either. Combining characters are one case and they are not rare at all. What about surrogates? What about grapheme clusters that are a complicated sequence of emoji and emoji modifiers with ZWJ?

A better suggestion is to rethink why you need those operations in the first place.

zigzigzag · on June 8, 2017

The direction of the nation is decided by less than a majority in every vote. So why would this one in particular require "special measures"?

I think what you're getting at here is you support the EU, so would prefer if attempts to leave it or defy it were harder to implement than normal decisions.

pjbster · on June 9, 2017

My level of support for the EU is neither here or there but I do get really irate when my country's PM presumes to act according to "the will of the people". We simply don't really know whether that's true or not and I think it'd be nice to find out (somehow).

gambiting · on June 8, 2017

Even if remain won, I would still like to see at least 2/3 majority requirement for such an important vote. I don't mind simple majority for periodically repeating votes(parliamentary elections) but decisions which are almost irreversible or which have deep and long-lasting consequences(leaving/joining EU, going to war, breaking up the United Kingdom, changing constitution) should have a 2/3 majority requirement.

zigzigzag · on June 8, 2017

And how about if the vote had been phrased as "do you wish to remain in the EU", with a 2/3rd requirement to meet the bar, meaning 1/3rd was sufficient to trigger Brexit? Would that have seemed fair?

Democracy evolved as a shortcut to avoid fights. Just count and you got a rough idea of which side has the most people and thus, is more likely to win if it came down to it. Once you start tipping the threshold in order to bias things towards your preferred decisions, you increase the risk of the losing side thinking ... wait a minute. We could win this. That's far worse than any other outcome.

The EU and its supporters constantly warp the system to try and make it hard for people to leave, hard for the people to reject their policies. It is fundamentally undemocratic.

gambiting · on June 9, 2017

Well, if you look at all the things I mentioned in my post, they have one thing in common - they are votes to change something, not to keep it the same. To continue with my example, why should the vote be "do you want to go to war" and not "do you want to not go to war"? Because the current state is not being at war, so I'd argue you need an overwhelming support to actually go to war, since it's not a lightweight decision.

>>The EU and its supporters constantly warp the system to try and make it hard for people to leave, hard for the people to reject their policies. It is fundamentally undemocratic.

Well......I guess it's because I want to see a federal europe with all nations united into one, so of course I don't want people to leave the union. But as we are seeing now, they can. What's more, I truly believe that every EU country can reject any policy they want, what are the consequences of doing that, really? Look at Poland, Hungary - their governments are powering ahead with populist nationalist policies which are firmly against EU laws and policies, and what does EU do? They send a strongly worded warning, saying there may be sanctions. I don't think the Polish government cares - they will serve their entire term without any repercussions other than making a lot of enemies in neighbouring countries.

As for fundamentally undemocratic.....is there any country in the world which is "fundamentally democratic"? Literally no country has democracy as it was first implemented in Greece, and in most countries around the world most votes != win, just like Trump won despite not receiving the majority of votes, just like in UK the party that receives majority of votes means nothing since all that counts is seats.......we implement "democracy" in various forms all the time. So why would seat-based parliament be a democracy, but 2/3 majority requirement not?

zigzigzag · on June 7, 2017

IPv6 is less common at work, more common at home.

zigzigzag · on June 7, 2017

But are you roaming in those places? For reasons I don't understand fully but are presumably billing related your IP traffic gets tunnelled back to your home ISP when roaming, or at least, it used to.

zigzigzag · on June 6, 2017

Isn't it "people an NSA analyst believes are Russian hackers"? I read the Intercept story but didn't see where it showed convincing evidence of that. It just says they showed no doubt.

zigzigzag · on June 6, 2017

The problem with insisting on roundness, which has been a focus of the education system for years, is that it generates tons of generic shapeless people who specialise in nothing and find themselves unable to obtain the best, high paying jobs.

In my family, myself and my brother have been successful by focusing on one or two skills and honing them. That was made much harder by the education system, which fought us the whole way, because it sees specialisation as some sort of problem when it is in fact the solution. In my brother's case the school tried to insist he went to university. He didn't, as he knew full well what he wanted to do and reckoned, correctly, he would do better without being a student. In my case the university insisted that I take non-CS classes despite that I was paying them for a CS course. The classes were interesting, but marked arbitrarily (i.e. one essay at the end and who knows how it's evaluated?). I nearly got kicked out of CS because of a single essay written on archaeology!

As I go through life, I constantly encounter people who thought they were "learning how to learn" or "learning how to think" when they went to university, only to discover after graduation that they had no particular skills and were seem as essentially worthless by the job market. It's tremendously depressing for them and creates constant, lifelong insecurity.

Critical thinking abilities are something you want on top but are not a substitute for actual, hard skills. And they are certainly not something a university can teach - please. All the stats and studies show that universities are incredibly ideologically homogenous and rapidly stamp out any political thought that deviates from their left wing consensus. Universities teach people that thinking and disagreement are dangerous, that opinions are "triggering", and speaking out loud leads to exclusion. They're the last place on earth I'd expect critical thinking skills to emerge unscathed.

adrusi · on June 6, 2017

I'm a humanities student and not on the left. My experience with leftist professors is that even if they try to actively push their politics on students, they will still give As to papers with well-reasoned dissenting views on highly political topics, immigration for instance. The only unfairness is that students who just repeat everything the professor says in their paper will get an easy grade without much thinking but I don't know what can be done about that, unless professors are to penalize unoriginality. The groupthink isn't an obstacle to critical thinking, it's just an excuse to avoid it.

Critical thinking skills aren't something to have on top of domain-specific skills, they're something to have as a foundation for them. If you focus on critical thinking skills in lieu of anything domain specific, and expect to get a job without further learning, you're foolish, but you'll have an easier time learning the specific skills you need for practical work anyway.

ticviking · on June 6, 2017

My degree was in Social Studies education, and I suspect you underestimate the degree to which groupthink is pushed in social sciences, and $area Studies. Especially when compared to the humanities.

The humanities have a long tradition of debate and disagreement as a path to seeking the truth, that is sometimes lacking in sociology or political science.

zigzigzag · on June 4, 2017

Eh?

UK engaged in mass telephone/fax taps during the Troubles with wild abandon.

http://www.lamont.me.uk/capenhurst/original.html

The inability to monitor suspects telecommunications is a very recent phenomenon. Governments had that ability more or less from the moment Bell started building his system.

coldtea · on June 4, 2017

>The inability to monitor suspects telecommunications is a very recent phenomenon.

First, for actual suspects there never was any inability to monitor their telecommunications. (and any legal inability never meant much to the agencies doing it).

This is not about this (targeted taps), it's about tapping everyone and at all times. Which they also do, but now they want to make it official and sanctioned.

zigzigzag · on June 4, 2017

There is also the opposite problem. The public aren't asking for it not to be done either. Internet surveillance isn't a hot button political topic either way. However, terrorism is. Hence the problem.

zigzigzag · on June 4, 2017

Thought crime already exists for a long time already. Planning to commit murder (thinking about it) is punished nearly as harshly as succeeding. Is it really so controversial?