Probabilistic Digital System for the Ultra-fast Mining of Huge Databases

This patent presents a new system that is able to achieve large performance values in the exploration of large datasets due to the huge parallelism that is achieved with probabilistic computing methodologies. The logic structures can be replicated hundred of times in a single FPGA in order to obtain a high parallelism with a minimum cost. The present invention can be applied to multiple scientific disciplines where useful information has to be extracted from huge databases.
This circuitry implements a basic similarity search, which is the core of most data mining algorithms. Similarity search is the traditional bottleneck for virtually all data mining algorithms. A first prototype has been implemented in a PCIe-based board containing four large-scale FPGAs, obtaining a performance of more than 100 millions of comparisons per second (where each comparison involves the treatment of descriptors with a total of 96 bits per descriptor). This system is a non-conventional design based on probabilistic logic rather than the traditional and deterministic binary logic. The system is able to process big amounts of information in parallel, in contrast to conventional processor-based techniques that are intrinsically sequential in its nature. The key point of probabilistic computing is that they provide most probable results as an output instead of the exact result. The difference can be orders of magnitude in terms of computation time (or in terms of the financial resources needed to achieve an specific performance).
The patented system represents a great step in the exploration of huge databases in reasonable times. The processing speed of the system is orders of magnitude quicker with respect traditional microprocessor-based comparisons.
One of the fields in which the invention could have a greater impact is in the early stages of drug discovery, where virtual screening techniques must be used to identify novel drug leads from large molecular databases. Those databases can contain thousand of millions of molecules that must be screened in the research of new compounds presenting biological activity. The use of conventional techniques must use multi-processor computations (cloud computing, clusters, supercomputing...). With the present invention, the costs in terms of hardware and energy would be reduced considerably, (in the same factor in which the processing speed is increased).
The present invention is also applicable to any other scientific or technologic field in which is necessary the exploration of large datasets for similarity searches in reasonable times. More specifically, the patented design is able to compare a reference vector with different vectors selected from a database. Large databases are explored using large packets of vectors. The response of the probabilistic system is to provide at its output the label of the vector that is closer to the vector used as a reference (if this vector exists in the selected packet). This system is designed for the quick exploration of huge databases. The results of each comparison is stored in a memory.
The final result is a selection of the closer to the reference vectors.
The techniques that are used to implement the digital comparators are inspired in the neural behavior, based in the interaction of multiple action potentials in biological neural networks that, following the basic probabilistic laws, are able to compare and process in parallel big information sets.
The principal innovative aspect is the use of probabilistic techniques that are able to increase the processing speed with respect to traditional deterministic techniques. Those techniques are more appropriate for the processing of huge amounts of information in reasonable times.

Attached files:

Patents:
US 201,231,295 issued 2012-08-13

Type of Offer: Sale or Licensing

Contact Patent Owner or Broker

Next Patent »

« More Data Processing Patents

Share on

CrowdSell Your Patent