Infinispan technical FAQs
This topic has not yet been written. The content below is from the topic description.
What is Infinispan's primary API? Infinispan's primary API - org.infinispan.Cache - extends java.util.concurrent.ConcurrentMap and closely resembles javax.cache.Cache from JSR 107. What would I use Infinispan for? Infinispan's org.infinispan.Cache is a simple, flat data structure that can optionally include characteristics such as distribution, eviction, JTA compatibility. What other APIs are available? org.infinispan.tree.TreeCache is a tree structured API that looks a lot like JBoss Cache's API. Note that the similarities end at the interface though, since internal implementation and representation of the tree is completely different, using a much more efficient flat structure.  Coming Soon.. An API for fine-grained replication is planned. This will provide the same benefits of JBoss Cache's POJOCache variant, but far simpler and more robust. It will not rely on bytecode weaving or AOP, and present users with a much more familiar JPA-style session interface. For more details, see this design document.  So which API should I use? Use the most performant org.infinispan.Cache API. TreeCache should be considered as a compatibility API, if you are migrating from JBoss Cache and cannot invest the time in rewriting your application, or your application specifically relies on a tree structure. When released, the fine-grained API will sacrifice performance, but give you cache data organization and fine-grained replication. This organisation inevitably involves heavy use of reflection, proxies and comparisons, and isn't nearly as efficient as more explicit use of the Cache API. Which JVMs/JDKs does Infinispan work with? Infinispan is developed and primarily tested against Sun's Java SE 6. It should work with most Java SE 6 implementations, including those from IBM, HP, Apple, Oracle (BEA), and IcedTea. We expect to test on Java SE 7 once this is finalized as well. Is Infinispan's configuration compatible with JBoss Cache? No. But we intend to provide transformation scripts. Keep in mind though that as long as you use custom components - custom interceptors, cache loaders, eviction policies - we will not be able to translate these and this would have to be done manually. What are these methods? The four methods of put(), putIfAbsent(), remove(), replace() and their overloaded forms. Cache Loaders and Cache Stores Cache loaders and stores - what's the difference? JBoss Cache shipped with a CacheLoader interface and a number of implementations. Infinispan has broken this up into two separate interfaces - a CacheLoader simply loads state from elsewhere, while a CacheStore - which extends CacheLoader - exposes methods to store state as well, making it easy to define read-only sources. Infinispan too ships with several high performance implementations of these interfaces. In JBoss Cache, the JDBC and File CacheLoaders had restrictions such as only being able to use Strings in Fqns. Is this still the case in Infinispan? No. We have completely re-written these implementations with a much better design which allows us to use arbitrary keys (or Fqn elements if using the TreeCache API), provided they are Serializable. For details, see the BucketBasedCacheStore. Are modifications to asynchronous cache stores coalesced or aggregated? Before 4.0.0.Beta1, cache store modifications were queued in such way that a modification processor thread would empty the modification queue and apply each modification individually. This implementation was not able to detect multiple changes for the same key within the queue which meant that if the queue contained 10 modifications for the same key, it would apply all 10 modifications individually.  Since 4.0.0.Beta1 (ISPN-116), modifications are coalesced or aggregated for the interval that the modification processor thread is currently applying. This means that while changes are being queued, if multiple modifications are made to the same key, only the key's last state will be applied, hence reducing the number of calls to the cache store.  What does the passivation flag do? Passivation is a mode of storing entries in the cache store only when they are evicted from memory. The benefit of this approach is to prevent a lot of expensive writes to the cache store if an entry is hot (frequently used) and hence not evicted from memory. The reverse process, known as activation, occurs when a thread attempts to access an entry which is not in memory but is in the store (i.e., a passivated entry). Activation involves loading the entry into memory, and then removing it from the cache store. With passivation enabled, the cache uses the cache store as an overflow tank, akin to swapping memory pages to disk in virtual memory implementations in operating systems.  If passivation is disabled, the cache store behaves as a write-through (or write-behind if asynchronous) cache, where all entries in memory are also maintained in the cache store. The effect of this is that the cache store will always contain a superset of what is in memory. Locking JBoss Cache exposed several different locking schemes - pessimistic, optimistic and MVCC. I don't see a way to specify locking scheme in Infinispan. Why is this? This is because Infinispan only supports MVCC. MVCC is by far more performant, threadsafe and consistent than the other locking schemes. What isolation levels does Infinispan support? Infinispan only supports READ_COMMITTED and REPEATABLE_READ schemes. What is the default isolation level in Infinispan? READ_COMMITTED. Unlike JBoss Cache, which used REPEATABLE_READ by default. We consider READ_COMMITTED to be good enough for most applications and hence its use as a default. Does Infinispan support distributed eager locking? Yes it does. Infinispan, by default, acquires remote locks lazily. Locks are acquired locally on a node that runs a transaction while other cluster nodes attempt to lock cache keys involved in a transaction during two-phase prepare/commit phase. However, if desired, Infinispan can eagerly lock cache keys either explicitly or implicitly. How does Infinispan support explicit eager locking? Infinispan cache interface exposes lock API that allows cache users to explicitly lock set of cache keys eagerly during a transaction. Lock call attempts to lock specified cache keys across all cluster nodes and it either succeeds or fails. All locks are released during commit or rollback phase.  Consider a transaction running on one of the cache nodes:  tx.begin() cache.lock(K)   // acquire cluster wide lock on K cache.put(K,V5) // guaranteed to succeed tx.commit()     // releases locks How does Infinispan support implicit eager locking? Implicit locking goes one step ahead and locks cache keys behind the scene as keys are accessed for modification operations.  Consider a transaction running on one of the cache nodes:  tx.begin() cache.put(K,V)   // acquire cluster wide lock on K cache.put(K2,V2) // acquire cluster wide lock on K2 cache.put(K,V5)  // no-op, we already own cluster wide lock for K tx.commit()      // releases locks Implicit eager locking locks cache keys across cluster nodes only if it is necessary to do so. In a nutshell, if implicit eager locking is turned on then for each modification Infinispan checks if cache key is locked locally. If it is then a global cluster wide lock has already been obtained, otherwise a cluster wide lock request is sent and lock is acquired.  Implicit eager locking is enabled as follows:  Transactions and JTA implementations When using Atomikos transaction manager, distributed caches are not distributing data, what is the problem? For efficiency reasons, Atomikos transaction manager commits transactions in a separate thread to the thread making the cache operations and until 4.2.1.CR1, Infinispan had problems with this type of scenarios and resulted on distributed caches not sending data to other nodes (see ISPN-927 for more details). Please note that replicaticated, invalidated or local caches would work fine. It's only distributed caches that would suffer this problem.  There're two ways to get around this issue, either: Upgrade to Infinispan 4.2.1.CR2 or higher where the issue has been fixed. If using Infinispan 4.2.1.CR1 or earlier, configure Atomikos so that com.atomikos.icatch.threaded_2pc is set to false. This results in commits happening in the same thread that made the cache operations. Eviction and Expiration Expiration does not work, what is the problem? Multiple cache operations such as put() can take a lifespan as parameter which defines the time when the entry should be expired. If you have no eviction configured and and you let this time expire, it can look as Infinispan has not removed the entry. For example, the JMX stats such as number of entries might not updated or the persistent store associated with Infinispan might still contain the entry. To understand what's happening, it's important to note that Infinispan has marked the entry as expired but has not actually removed it. Removal of expired entries happens in one of 2 ways:   You try and do a get() or containsKey() for that entry. The entry is then detected as expired and is removed. You have enabled eviction and an eviction thread wakes up periodically and purges expired entries.  If you have not enabled (2), or your eviction thread wakeup interval is large and you probe jconsole before the eviction thread kicks in, you will still see the expired entry. You can be assured that if you tried to retrieve the entry via a get() or containsKey() though, you won't see the entry (and the entry will be removed). I don't see notification for cache entry expired. Will it be implemented? Although it is possible to implement it, it will never be very accurate since entries are only tested for whether they are expired in one of several ways: A user thread asks for the entry and it has expired. It will then be removed. The entry is passivated/overflowed to disk and it has expired. It will again be removed. An eviction maintenance thread kicks in and detects that it has expired. It will again be removed.  So even if a notification is generated, it will not be generated at the time the entry expired, but rather at the time Infinispan realises the entry has expired (which may be later). On top of that, adding these type of notifications would add quite a bit of overhead. Why is cache size sometimes even higher than specified maxEntries of the eviction configuration element? Although one can specify maxEntries to be a value that is not a power of two, the underlying algorithm will size it to the value V closest to power of two that is larger than maxEntries specified. Eviction algorithms guarantee that the size of cache container will never be greater than V. Cache Manager Infinispan allows me to create several Caches from a single CacheManager. Are there any reasons to create separate CacheManagers? As far as possible, internal components are shared between Cache instances. Notably, RPC and networking components are shared. If you need caches that have different network characteristics - such as one cache using TCP while another uses UDP - we recommend you create these using different cache managers. Can I create caches using different cache modes using the same cache manager? Yes. You can create caches using different cache modes, both synchronous and asynchronous, using the same cache manager. Can transactions span different Cache instances from the same cache manager? Yes. Each cache behaves as a separate, standalone JTA resource. Internally though, components may be shared as an optimization but this in no way affects how the caches interact with a JTA manager. Cache Modes (Distribution, Replication, Invalidation) What is the difference between a replicated cache and a distributed one? DIST is a new cache mode in Infinispan, in addition to REPL and INVALIDATION. The key difference is that a replicated cache (REPL) is where the state of all caches in a cluster are identical. I.e., if a key exists on one instance, it will also exist on *all* other instances. In a distributed cache, however, a sufficient number of copies are maintained for redundancy and fault tolerance, but this is typically far fewer than the number of instances in the cluster, providing a far greater degree of scalability than a more simplistic replicated cache. This cache mode also is able to transparelty locate keys across a cluster, and even perform L1 caching for fast read access of state that should reside remotely. For detailed designs, refer tothis document. Does DIST support both synchronous and asynchronous communications? Officially, no. And unofficially, yes. Here's the logic. For certain public API methods to have meaningful return values (i.e., to stick to the interface contracts), if you are using DIST, synchronized communications are necessary. For example, you have 3 caches in a cluster, A, B and C. Key K maps to A and B. On C, you perform an operation that requires a return value. E.g., Cache.remove(K). For this to work, the call needs to be forwarded to A and B synchronously, and would have to wait for the result from either A or B to return to the caller. If communications were asynchronous, the return values cannot be guaranteed to be useful - even though the operation would behave as expected.  Now unofficially, we will add a configuration option to allow you to set your cache mode to DIST and use asynchronous communications, but this would be an additional configuration option (perhaps something like break_api_contracts) so that users are aware of what they are getting into. Why bother with this? Why not just document the return type to be unreliable if used with DIST and asynchronous? Because this will cause problems if people write their application using one cache mode (e.g., replication, or dist + synchronous) and then attempt to switch to dist + asynchronous at a later date. The code will still compile, tests may even pass. And then things could start to get ugly. :-) What about buddy replication then? Buddy Replication is not available in Infinispan. The new distributed cache mode solves the same problems in a far more elegant and scalable manner. Read this blog article for a more detailed discussion on the subject. I notice that when using DIST, the cache does a remote get before a write command. Why is this? Certain methods, such as Cache.put(), are supposed to return the previous value associated with the specified key according to the java.util.Map contract. If this is performed on an instance that does not own the key in question and the key is not in L1 cache, the only way to reliably provide this return value is to do a remote GET before the put. This GET is always sync (regardless of whether the cache is configured to be sync or async) since we need to wait for that return value. Isn't that expensive? How can I optimize this away? It isn't as expensive as it sounds. A remote GET, although sync, will not wait for all responses. It will accept the first valid response and move on, thus making its performance has no relation to cluster size.  If you feel your code has no need for these return values, then this can be disabled completely (by specifying the configuration element for a cache-wide setting or the Flag.SKIP_REMOTE_LOOKUP for a per-invocation setting). Note that while this will not impair cache operations and accurate functioning of all public methods is still maintained. However, it will break the java.util.Map interface contract by providing unreliable and inaccurate return values to certain methods, so you would need to be certain that your code does not use these return values for anything useful. So what is this L1 cache? An L1 cache (enabled by default, can be disabled) only exists if you set your cache mode to distribution. An L1 cache prevents unnecessary remote fetching of entries mapped to remote caches by storing them locally for a short time after the first time they are accessed. By default, entries in L1 have a lifespan of 60,000 milliseconds (though you can configure how long L1 entries are cached for). L1 entries are also invalidated when the entry is changed elsewhere in the cluster so you are sure you don't have stale entries cached in L1. Caches with L1 enabled will consult the L1 cache before fetching an entry from a remote cache.  Also known as a near cache in competing distributed cache products. I use a clustered cache. I want the guarantees of synchronous replication with the parallelism of asynchronous replication. What can I do? Infinispan offers a new async API to provide just this. These async methods return Futures which can be queried, causing the thread to block till you get a confirmation that any network calls succeeded. Read more about in over here:  http://infinispan.blogspot.com/2009/05/whats-so-cool-about-asynchronous-api.html I have caches configured with asynchronous replication or distribution, but these caches appear to be behaving synchronously (waiting for responses), what is going on? If you have state transfer configured and you have asynchronous mode configured, caches will behave in a synchronous way. This is done so that state transfer can work as expected, but the current solution expands the synchronous calls to cache operations as well, which results in this unexpected behaivour. A better solution that will resolve this confusion is already in the making where you can further information on possible workarounds to this issue. Cache and CacheManager Listeners In a @CacheEntryModified annotated method, can the modified value be retrieved via Cache.get() when isPre=false? No, it cannot. Please use CacheEntryModifiedEvent.getValue() to retrieve the value of the entry that was modified. Running Infinispan in Cloud How to make Infinispan send replication traffic over an internal network when you don't know the IP address? Some cloud providers charge you less for traffic over internal IP addresses compared to public IP addresses, in fact, some cloud providers do not even charge a thing for traffic over the internal network (i.e. GoGrid). In these circumstances, it's really advantageous to configure Infinispan in such way that replication traffic is sent via the internal network. The problem though is that quite often you don't know which internal IP address you'll be assigned (unless you use elastic IPs and dyndns.org), so how do you configure Infinispan to cope with those situations?  JGroups, which is the underlying group communication library to interconnect Infinispan instances, has come up with a way to enable users to bind to a type of address rather than to a specific IP address. So now you can configure bind_addr property in JGroups configuration file, or -Djgroups.bind_addr system property to a keyword rather than a dotted decimal or symbolic IP address:  GLOBAL: pick a public IP address. You want to avoid this for replication traffic SITE_LOCAL: use a private IP address, e.g. 192.168.x.x. This avoids charges for bandwith from GoGrid, for example LINK_LOCAL: use a 169.x.x.x, 254.0.0.0 address. I've never used this, but this would be for traffic only within 1 box NON_LOOPBACK: use the first address found on an interface (which is up), which is not a 127.x.x.x address  Infinispan GUI Demo When using the GUI Demo, I've just put an entry in the cache with lifespan of -1. Why do I see it as having a lifespan of 60,000? This is probably a L1 caching event. When you put an entry in the cache, the entry is mapped to specific nodes in a cluster using a consistent hashing algorithm. This means that key K could map on to caches