How fast is bit packing?
One of the most anticipated changes in Lucene/Solr 4.0 is its improved memory efficiency. Indeed, according to severalbenchmarks, you could expect a 2/3 reduction in memory use for a Lucene-based...
View ArticleWhat is the theory behind Apache Lucene?
There is a recurring request from users to have more insight into Lucene internals. For example, see: Lucene user mailing-list - lucene algorithm?,StackOverflow - How does Lucene index documents?,Quora...
View ArticleWow, LZ4 is fast!
I’ve been doing some experiments with LZ4 recently and I must admit that I am truly impressed. For those not familiar with LZ4, it is a compression format from the LZ77 family. Compared to other...
View ArticleEfficient compressed stored fields with Lucene
Whatever you are storing on disk, everything usually goes perfectly well until your data becomes too large for your I/O cache. Until then, most disk accesses actually never touch disk and are almost as...
View ArticleStored fields compression in Lucene 4.1
Last time, I tried to explain how efficient stored fields compression can help when your index grows larger than your I/O cache. Indeed, magnetic disks are so slow that it is usually worth spending a...
View Articlelz4-java 1.0.0 released
I am happy to announce that I released the first version of lz4-java, version 1.0.0.lz4-java is a Java port of the lz4 compression library and the xxhash hashing library, which are both known for being...
View ArticlePutting term vectors on a diet
What are term vectors?Term vectors are an interesting Lucene feature, which allows for retrieving a single-document inverted index for any document ID of your index. This means that given any document...
View Articlelz4-java 1.1.0 is out
I’m happy to announce the release of lz4-java 1.1.0. Artifacts can be downloaded from Maven Central and javadocs can be found at jpountz.github.com/lz4-java/1.1.0/docs/.Release highlightslz4 has been...
View ArticleVersatile sorting
Sorted data sets are very useful since they make a lot of things easier:checking for duplicates,computing frequencies (the number of times each unique element appears),compression (thanks to delta...
View Articlelz4-java 1.3.0 is out
A new release of lz4-java is out and already available on Maven Central. As usual, documentation and benchmarks (compression, decompression and hashing) are published at...
View Article
More Pages to Explore .....